Mini -dystrophin nucleic acid and peptide sequences

ABSTRACT

The present invention relates to compositions and methods for expressing mini-dystrophin peptides. In particular, the present invention provides compositions comprising nucleic acid sequences that are shorter than wild-type dystrophin cDNA and that express mini-dystrophin peptides that function in a similar manner as wild-type dystrophin proteins. The present invention also provides compositions comprising mini-dystrophin peptides, and methods for expressing mini-dystrophin peptides in target cells.

[0001] The present Application claims priority to U.S. ProvisionalApplication Serial No. 60/238,848, filed Oct. 6, 2000, herebyincorporated by reference.

[0002] This invention was made with Government support under contractNIH R01AR40864-10. The government has certain rights in this invention.

FIELD OF THE INVENTION

[0003] The present invention relates to compositions and methods forexpressing mini-dystrophin peptides. In particular, the presentinvention provides compositions comprising nucleic acid sequences thatare shorter than wild-type dystrophin cDNA and that expressmini-dystrophin peptides that function in a similar manner as wild-typedystrophin proteins. The present invention also provides compositionscomprising mini-dystrophin peptides, and methods for expressingmini-dystrophin peptides in target cells.

BACKGROUND OF THE INVENTION

[0004] Muscular dystrophy is a group of inherited disorderscharacterized by progressive muscle weakness and loss of muscle tissue.Muscular dystrophies includes many inherited disorders, includingBecker's muscular dystrophy and Duchenne's muscular dystrophy, which areboth caused by mutations in the dystrophin gene. Both of the disordershave similar symptoms, although Becker's muscular dystrophy is a slowerprogressing form of the disease. Duchenne's muscular dystrophy is arapidly progressive form of muscular dystrophy.

[0005] Both disorders are characterized by progressive muscle weaknessof the legs and pelvis which is associated with a loss of muscle mass(wasting). Muscle weakness also occurs in the arms, neck, and otherareas, but not as severely as in the lower half of the body. Calfmuscles initially enlarge (an attempt by the body to compensate for lossof muscle strength), the enlarged muscle tissue is eventually replacedby fat and connective tissue (pseudohypertrophy). Muscle contractionsoccur in the legs and heels, causing inability to use the musclesbecause of shortening of muscle fibers and fibrosis of connectivetissue. Bones develop abnormally, causing skeletal deformities of thechest and other areas. Cardiomyopathy occurs in almost all cases. Mentalretardation may accompany the disorder but it is not inevitable and doesnot worsen as the disorder progresses. The cause of this impairment isunknown. Becker's muscular dystrophy occurs in approximately 3 out of100,000 people. Symptoms usually appear in men between the ages of 7 and26. Women rarely develop symptoms. There is no known cure for Becker'smuscular dystrophy. Treatment is aimed at control of symptoms tomaximize the quality of life. Activity is encouraged. Inactivity (suchas bed rest) can worsen the muscle disease. Physical therapy may behelpful to maintain muscle strength. Orthopedic appliances such asbraces and wheelchairs may improve mobility and self-care. Becker'smuscular dystrophy results in slowly progressive disability. A normallife span is possible; however, death usually occurs after age 40.

[0006] Duchenne's muscular dystrophy occurs in approximately 2 out of10,000 people. Symptoms usually appear in males 1 to 6 years old.Females are carriers of the gene for this disorder but rarely developsymptoms. There is no known cure for Duchenne's muscular dystrophy.Treatment is aimed at control of symptoms to maximize the quality oflife. Activity is encouraged. Inactivity (such as bed rest) can worsenthe muscle disease. Physical therapy may be helpful to maintain musclestrength and function. Orthopedic appliances such as braces andwheelchairs may improve mobility and the ability for self-care.Duchenne's muscular dystrophy results in rapidly progressive disability.By age 10, braces may be required for walking, and by age 12, mostpatients are confined to a wheelchair. Bones develop abnormally, causingskeletal deformities of the chest and other areas. Muscular weakness andskeletal deformities contribute to frequent breathing disorders.Cardiomyopathy occurs in almost all cases. Intellectual impairment iscommon but is not inevitable and does not worsen as the disorderprogresses. Death usually occurs by age 15, typically from respiratory(lung) disorders.

[0007] Although there are no available treatments for musculardystrophy, the usefulness of gene replacement as therapy for the diseasehas been established in transgenic mouse models. Unfortunately, progresstoward therapy for human patients has been limited by lack of a suitabletechnique for delivery of such vectors to large masses of muscle cells.What is needed in the art is a vector that can carry most of thedystrophin coding sequence, that can be cheaply produced in largequantities, that can be delivered to a large mass of muscle cells, andthat provides stable expression of dystrophin after delivery.

SUMMARY OF THE INVENTION

[0008] The present invention provides compositions and methods forexpressing mini-dystrophin peptides. In particular, the presentinvention provides compositions comprising nucleic acid sequences thatare shorter than wild-type dystrophin cDNA and that expressmini-dystrophin peptides that function in a similar manner as wild-typedystrophin proteins. The present invention also provides compositionscomprising mini-dystrophin peptides, and methods for expressingmini-dystrophin peptides in target cells.

[0009] The present invention provides such shortened nucleic acidsequences in a variety of ways. For example, the present inventionprovides nucleic acids encoding only 4, 8, 10, 12, 14, 16, 18, 20 and 22spectrin-like repeat encoding sequences (i.e. nucleic acids encoding anexact number of spectrin-like repeats). As wild-type dystrophin has 24spectrin-like repeat encoding sequences, providing nucleic acidsencoding fewer numbers of repeats reduces the size of the dystrophingene (e.g. allowing the nucleic acid sequence to fit into vectors withlimited cloning capacity). Another example of such shortened nucleicacid sequences are those that lack at least a portion of thecarboxy-terminal domain of wild-type dystrophin nucleic acid. A furtherexample of such shortened nucleic acid sequences are those that lack atleast a portion of the 3′ untranslated region, or 5′ untranslatedregion, or both. In certain embodiments, the present invention providescompositions comprising the peptides expressed by the nucleic acidsequences of the present invention.

[0010] In certain embodiments, the present invention providescompositions comprising nucleic acid encoding a mini-dystrophin peptide,wherein the mini-dystrophin peptide comprises a spectrin-like repeatdomain, and wherein the spectrin-like repeat domain consists of nspectrin-like repeats, wherein n is an even number less than 24. Inparticular embodiments, the present invention provides nucleic acidencoding a mini-dystrophin peptide, wherein the mini-dystrophin peptidecomprises a spectrin-like repeat domain comprising n spectrin-likerepeats, wherein the mini-dystrophin peptide contains no more than nspectrin-like repeats, and wherein n is an even number that is less than24 and at least 4. In some embodiments, the present invention providesnucleic acid encoding a mini-dystrophin peptide, wherein themini-dystrophin peptide comprises n spectrin-like repeats, wherein themini-dystrophin peptide contains no more than n spectrin-like repeats,and wherein n is an even number that is less than 24 and at least 4.

[0011] In some embodiments, n is 20 or less. In other embodiments, n is16 or less. In particular embodiments, n is 12 or less. In additionalembodiments, n is 8 or less. In preferred embodiments, n is 4. Incertain embodiments, n is selected from 4, 8, 10, 12, 14, 16, 18, 20 and22. In some embodiments, the present invention provides compositionscomprising nucleic acid encoding a mini-dystrophin peptide, wherein themini-dystrophin peptide comprises a spectrin-like repeat domain, andwherein the spectrin-like repeat domain consists of n spectrin-likerepeats, wherein n is 4, 8, 12, 16, or 20. In certain embodiments, thepresent invention provides the peptides encoded by the nucleic acidsequences encoding the mini-dystrophin peptides.

[0012] In certain embodiments, the present invention providescompositions comprising nucleic acid encoding a mini-dystrophin peptide,wherein the mini-dystrophin peptide comprises i) a spectrin-like repeatdomain comprising 4 dystrophin spectrin-like repeats, ii) anactin-binding domain, and iii) a β-dystroglycan binding domain; andwherein the mini-dystrophin peptide contains no more than 4 dystrophinspectrin-like repeats.

[0013] In some embodiments, the present invention provides compositionscomprising a mini-dystrophin peptide, wherein the mini-dystrophinpeptide comprises a spectrin-like repeat domain comprising nspectrin-like repeats, wherein the mini-dystrophin peptide contains nomore than n spectrin-like repeats, and wherein n is an even number thatis less than 24 and at least 4. In particular embodiments, the presentinvention provides a cell (or cell line) containing the nucleic acid andpeptide sequences of the present invention.

[0014] In certain embodiments, the mini-dystrophin peptide is capable ofaltering a measurable muscle value in a DMD animal model by at leastapproximately 10% of the wild type value. In other embodiments, themini-dystrophin peptide is capable of altering a measurable muscle valuein a DMD animal model by at least approximately 20% of the wild typevalue. In particular embodiments, the mini-dystrophin peptide is capableof altering a measurable muscle value in a DMD animal model by at leastapproximately 30% of the wild type value. In preferred embodiments, themini-dystrophin peptide is capable of altering a measurable muscle valuein a DMD animal model to a level similar to the wild-type value (e.g.±4%). In certain embodiments, the nucleic acid comprises at least 2, orat least 4, spectrin-like repeat encoding sequences. In someembodiments, the spectrin-like repeat encoding sequences are precisespectrin-like repeat encoding sequences. In certain embodiments, thenucleic acid is less than 5 kilo-bases in length. In other embodiments,the nucleic acid is less than 6 kilo-bases in length. In particularembodiments, the nucleic acid comprises viral DNA (e.g. adenovirus DNA).In preferred embodiments, the viral DNA comprises adeno-associated viralDNA.

[0015] In certain embodiments, the present invention providescompositions comprising nucleic acid encoding a mini-dystrophin peptide,wherein the mini-dystrophin peptide comprises a spectrin-like repeatdomain, and wherein the spectrin-like repeat domain consists of nspectrin-like repeats, wherein n is an even number less than 24; andwherein the nucleic acid comprises an actin-binding domain encodingsequence, a β-dystroglycan-binding domain encoding sequence, and atleast 2, or at least 4, spectrin-like repeat encoding sequences. In someembodiments, the nucleic acid comprises at least 4 spectrin-like repeatencoding sequences.

[0016] In certain embodiments, the present invention providescompositions comprising nucleic acid, wherein the nucleic acid comprisesat least 2 spectrin-like repeat encoding sequences, and wherein thenucleic acid encodes a mini-dystrophin peptide comprising aspectrin-like repeat domain, wherein the spectrin-like repeat domainconsists of n spectrin-like repeats, and wherein n is an even numberless than 24. In some embodiments, the nucleic acid comprises at least 4spectrin-like repeat encoding sequences.

[0017] In some embodiments, the nucleic acid comprises SEQ ID NO:39(i.e. ΔR4-R23). In other embodiments, the nucleic acid comprises SEQ IDNO:40 (i.e. ΔR2-R21). In certain embodiments, the nucleic acid comprisesSEQ ID NO:41 (i.e. ΔR2-R21+H3). In still other embodiments, the nucleicacid comprises SEQ ID NO:42 (i.e. ΔH2−R19).

[0018] In certain embodiments, the nucleic acid comprises an expressionvector (e.g. plasmid, virus, etc). In some embodiments, the expressionvector comprises viral DNA. In certain embodiments, the viral DNAcomprises adeno-viral DNA. In some embodiments, the viral DNA compriseslentiviral DNA. In other embodiments, the viral DNA compriseshelper-dependent adeno-viral DNA. In preferred embodiments, the viralDNA comprises adeno-associated viral DNA. In some embodiments, thenucleic acid is inserted in a virus (e.g. adeno-associated virus,adenovirus, helper-dependent adeno-associated virus, lentivirus).

[0019] In certain embodiments, the nucleic acid comprises anactin-binding domain encoding sequence. In particular embodiments, theactin binding domain comprises at least a portion of SEQ ID NO:6 (e.g.5%, 10%, 20%, 40%, 50%, or 75% of SEQ ID NO:6). In other embodiments,the actin binding domain comprises at least a portion of a homolog ormutated version of SEQ ID NO:6 (e.g. 5%, 10%, 20%, 40%, 50%, or 75% of aSEQ ID NO:6 homolog or mutated version of SEQ ID NO:6). In certainembodiments, the nucleic acid comprises a β-dystroglycan binding domain.In certain embodiments, the β-dystroglycan binding domain comprises atleast a portion of a dystrophin hinge 4 encoding sequence (e.g. the 3′50% of SEQ ID NO:34), and at least a portion of dystrophin cysteine-richdomain encoding sequence (e.g. the 5′ 75% of SEQ ID NO:35). Inparticular embodiments, at least a portion of hinge 4 is the WW domain(SEQ ID NO:45), or a homolog or mutation thereof.

[0020] In particular embodiments, the spectrin-like repeat encodingsequences are selected from the group consisting of SEQ ID NOS:8-10,12-27, and 29-33. In some embodiments, the spectrin-like repeat encodingsequences are selected from the group consisting of SEQ ID NOS:8-10,12-27, and 29-33, and homologs or mutations of SEQ ID NOS:8-10, 12-27,and 29-33. In preferred embodiments, the spectrin-like repeat encodingsequences are selected from the group consisting of SEQ ID NOS:8-10 and29-33. In some embodiments, the spectrin-like repeat encoding sequencesare identical (e.g. all the sequences are SEQ ID NO:8). In preferredembodiments, the spectrin-like repeat encoding sequences are alldifferent (e.g. the nucleic acid sequence has only 4 spectrin-likerepeat encoding sequences, and these 4 are: SEQ ID NO:8, SEQ ID NO:9,SEQ ID NO:10, and SEQ ID NO:33). In certain embodiments, nucleic acidsequence comprises at least one spectrin-like repeat encoding sequenceselected from the group consisting of SEQ ID NOS:8-10, and at least onespectrin-like repeat encoding sequence selected from the groupconsisting of SEQ ID NOS:29-33.

[0021] In certain embodiments, the nucleic acid (or the resultingpeptide) comprises at least one dystrophin hinge region. In someembodiments, the nucleic acid comprises at least one dystrophin hingeregion selected from hinge region 1, hinge region 2, hinge region 3 andhinge region 4. In some embodiments, the nucleic acid comprises at leastone dystrophin hinge region selected from hinge region 1, hinge region2, and hinge region 3. In particular embodiments, dystrophin hingeregion 1 is SEQ ID NO:7, or a homolog (See, e.g. FIG. 11), or a mutantversion thereof. In particular embodiments, dystrophin hinge region 2 isSEQ ID NO:11, or a homolog (See, e.g. FIG. 11), or a mutant versionthereof. In certain embodiments, dystrophin hinge region 3 is SEQ IDNO:28, or a homolog (See, e.g. FIG. 11), or a mutant version thereof. Inother embodiments, dystrophin hinge region 4 is SEQ ID NO:34, or ahomolog (See, e.g. FIG. 11), or a mutant version thereof.

[0022] In some embodiments, the nucleic acid comprises a sequenceencoding at least a portion of wild-type dystrophin C-terminal protein.In other embodiments, the nucleic acid comprises at least a portion ofthe 5′ untranslated region. In particular embodiments, the nucleic acidcomprises at least a portion of the 3′ untranslated region. In differentembodiments, the nucleic acid sequence comprises regulatory sequences(e.g. MCK enhancer and promoter elements). In particular embodiments,the nucleic acid sequence is operably linked to regulatory sequences(e.g. MCK enhancer and promoter elements). In certain embodiments, thenucleic acid sequence comprises a mutant muscle-specific enhancerregion.

[0023] In particular embodiments, the nucleic acid has less than 75% ofa wild type dystrophin 5′ untranslated region. In other embodiments, thenucleic acid has less than 50% or 20% or 1% (e.g. 0, 1, 2 nucleotidesfrom a wild type dystrophin 5′ untranslated region). In particularlypreferred embodiments, the nucleic acid sequence does not contain any ofthe wild-type dystrophin 5′ untranslated region. In certain embodiments,the nucleic acid has less than 75% of a wild type dystrophin 3′untranslated region. In other embodiments, the nucleic acid has lessthan 50%, preferably less than 40%, more preferably less than 35% of awild type dystrophin 3′ untranslated region. In certain embodiments, thenucleic acid does not contain a wild-type dystrophin 3′ untranslatedregion (or, in some embodiments, any type of 3′ untranslated region).

[0024] In particular embodiments, the mini-dystrophin peptide (e.g.encoded by the nucleic acid of the present invention) comprises asubstantially deleted dystrophin C-terminal domain. In some embodiments,the mini-dystrophin peptide comprises less than 40% of wild typedystrophin C-terminal domain, preferably less than 30%, more preferablyless than 20%, even more preferably less than 1%, and most preferablyapproximately 0% (e.g. 0, 1, 2, 3 or 4 amino acids from the wild typedystrophin C-terminal domain). In some embodiments, the nucleic acidsequence comprises at least one intron sequence.

[0025] In some embodiments, the present invention provides methods forexpressing a mini-dystrophin peptide in a target cell, comprising; a)providing; i) a vector comprising nucleic acid encoding amini-dystrophin peptide, wherein the mini-dystrophin peptide comprises aspectrin-like repeat domain, and wherein the spectrin-like repeat domainconsists of n spectrin-like repeats, wherein n is an even number lessthan 24, and ii) a target cell, and b) contacting the vector with thetarget cell under conditions such that the mini-dystrophin peptide isexpressed in the target cells. In certain embodiments, the contactingcomprises transfecting. In some embodiments, the contacting is performedin-vitro. In particular embodiments, the contacting is done in-vivo. Inother embodiments, the target cell is a muscle cell. In particularembodiments, the target cell further comprises a subject (e.g. withDuchenne muscular dystrophy (DMD) or Becker muscular dystrophy (BMD)).In preferred embodiment, the mini-dystrophin peptide is expressed in thecells of a subject (e.g. such that symptoms of DMD or BMD are reduced oreliminated).

[0026] In certain embodiments, the present invention provides methodscomprising; a) providing; i) a vector comprising nucleic acid encoding amini-dystrophin peptide, wherein the mini-dystrophin peptide comprises aspectrin-like repeat domain comprising n spectrin-like repeats, whereinthe mini-dystrophin peptide contains no more than n spectrin-likerepeats, and wherein n is an even number that is less than 24 and atleast 4, and ii) a subject comprising a target cells (e.g. a subjectwith symptoms of a muscle disease, such as Muscular Dystrophy); and b)contacting the vector with the subject under conditions such that themini-dystrophin peptide is expressed in the target cell (e.g. such thatthe symptoms are reduced or eliminated). In preferred embodiments, thenucleic acid encoding the mini-dystrophin peptide is contained within anviral vector (e.g. adeno-associated viral vector), and the contacting isdone by means of injecting the viral vector into the subject.

[0027] In particular embodiments, the present invention providescompositions comprising nucleic acid, wherein the nucleic acid encodes amini-dystrophin peptide, and wherein the mini-dystrophin peptidecomprises a substantially deleted dystrophin C-terminal domain. In someembodiments, the present invention provides the peptides encoded by thenucleic acid of the present invention. In certain embodiments, thesubstantially deleted dystrophin C-terminal domain is less than 40% of awild type dystrophin C-terminal domain. In other embodiments, thesubstantially deleted dystrophin C-terminal domain is less than 30%,20%, or 1% of a wild type dystrophin C-terminal domain. In preferredembodiments, the substantially deleted dystrophin C-terminal domain isapproximately 0% of a wild type dystrophin C-terminal domain. In certainembodiments, the mini-dystrophin peptide does not contain any portion ofthe wild type dystrophin C-terminal domain (i.e. it is completelydeleted).

[0028] In certain embodiments, the mini-dystrophin peptide is capable ofaltering a measurable muscle value in a DMD animal model by at least 10%of the wild type value. In other embodiments, the mini-dystrophinpeptide is capable of altering a measurable muscle value in a DMD animalmodel by at least 20% of the wild type value. In particular embodiments,the mini-dystrophin-peptide is capable of altering a measurable musclevalue in a DMD animal model by at least 30% of the wild type value. Inpreferred embodiments, the mini-dystrophin peptide is capable ofaltering a measurable muscle value in a DMD animal model to a levelsimilar to the wild-type value (e.g. ±4%).

[0029] In certain embodiments, the nucleic acid comprises an expressionvector (e.g. plasmid, virus, etc). In some embodiments, the expressionvector comprises viral DNA. In certain embodiments, the viral DNAcomprises adeno-viral DNA. In some embodiments, the viral DNA compriseslentiviral DNA. In other embodiments, the viral DNA compriseshelper-dependent adeno-viral DNA. In preferred embodiments, the viralDNA comprises adeno-associated viral DNA. In some embodiments, thenucleic acid is inserted in a virus (e.g. adeno-associated virus,adenovirus, helper-dependent adeno-associated virus, lentivirus).

[0030] In certain embodiments, the nucleic acid comprises anactin-binding domain encoding sequence. In particular embodiments, theactin binding domain comprises at least a portion of SEQ ID NO:6 (e.g.5%, 10%, 20%, 40%, 50%, or 75% of SEQ ID NO:6). In other embodiments,the actin binding domain comprises at least a portion of a homolog ormutated version of SEQ ID NO:6 (e.g. 5%, 10%, 20%, 40%, 50%, or 75% of aSEQ ID NO:6 homolog or mutated version of SEQ ID NO:6). In certainembodiments, the nucleic acid comprises a β-dystroglycan binding domain.In certain embodiments, the β-dystroglycan binding domain comprises atleast a portion of a dystrophin hinge 4 encoding sequence (e.g. the 3′50% of SEQ ID NO:34), and at least a portion of dystrophin cysteine-richdomain encoding sequence (e.g. the 5′ 75% of SEQ ID NO:35). Inparticular embodiments, at least a portion of hinge 4 is the WW domain(SEQ ID NO:45), or a homolog or mutation thereof.

[0031] In certain embodiments, the nucleic acid comprises at least onedystrophin hinge region. In some embodiments, the nucleic acid comprisesat least one dystrophin hinge region selected from hinge region 1, hingeregion 2, hinge region 3 and hinge region 4. In some embodiments, thenucleic acid comprises at least one dystrophin hinge region selectedfrom hinge region 1, hinge region 2, and hinge region 3. In particularembodiments, dystrophin hinge region 1 is SEQ ID NO:7, or a homolog(See, e.g. FIG. 11), or a mutant version thereof. In particularembodiments, dystrophin hinge region 2 is SEQ ID NO:11, or a homolog(See, e.g. FIG. 11), or a mutant version thereof. In certainembodiments, dystrophin hinge region 3 is SEQ ID NO:28, or a homolog(See, e.g. FIG. 11), or a mutant version thereof. In other embodiments,dystrophin hinge region 4 is SEQ ID NO:34, or a homolog (See, e.g. FIG.11), or a mutant version thereof.

[0032] In other embodiments, the nucleic acid comprises at least aportion of the 5′ untranslated region. In particular embodiments, thenucleic acid comprises at least a portion of the 3′ untranslated region.In different embodiment, the nucleic acid sequence comprises regulatorysequences (e.g. MCK enhancer and promoter elements). In particularembodiments, the nucleic acid sequence is operably linked to regulatorysequences (e.g. MCK enhancer and promoter elements). In certainembodiments, the nucleic acid sequence comprises a mutantmuscle-specific enhancer region.

[0033] In particular embodiments, the nucleic acid contains less that75% of a wild type dystrophin 5′ untranslated region. In otherembodiments, the nucleic acid contains less than 50% or 20% or 1% (e.g.0, 1, 2 nucleotides from a wild type dystrophin 5′ untranslated region).In particularly preferred embodiments, the nucleic acid sequence doesnot contain any of the wild-type dystrophin 5′ untranslated region. Incertain embodiments, the nucleic acid has less than 75% of a wild typedystrophin 3′ untranslated region. In other embodiments, the nucleicacid has less than 50%, preferably less than 40%, more preferably lessthan 35% of a wild type dystrophin 3′ untranslated region. In certainembodiments, the nucleic acid does not contain a wild-type dystrophin 3′untranslated region (or, in some embodiments, any type of 3′untranslated region).

[0034] In some embodiments, the present invention provides methods forexpressing a mini-dystrophin peptide in a target cell, comprising; a)providing; i) a vector comprising nucleic acid, wherein the nucleic acidencodes a mini-dystrophin peptide comprising a substantially deleteddystrophin C-terminal domain, and ii) a target cell, and b) contactingthe vector with the target cell under conditions such that themini-dystrophin peptide is expressed in the target cells. In certainembodiments, the contacting comprises transfecting. In otherembodiments, the target cell is a muscle cell.

[0035] In certain embodiments, the present invention provides systemsand kits with the mini-dystrophin nucleic acid and/or peptide sequencesdescribed herein. In certain embodiments, the systems and kits of thepresent invention comprise a nucleic acid sequence encoding amini-dystrophin peptide (and/or the mini-dystrophin peptide) and oneother component (e.g. an insert component with written instructions forusing the mini-dystrophin nucleic acid, or a nucleic acid encoding avector, or a component for delivering the nucleic acid to a subject,cells for expressing the mini-dystrophin peptide, a buffer, etc.). Incertain embodiments, the present invention provides a computer readablemedium (e.g. CD, hard drive, floppy disk, magnetic tape, etc.) thatcontains the nucleic acid or amino acid sequences of the presentinvention (e.g. a computer readable representation of the nucleotidebases used to make a mini-dystrophin nucleic acid sequence).

[0036] In some embodiments, the present invention providesmini-dystrophin nucleic acid sequences for use as a medicament. In otherembodiments, the present invention provides mini-dystrophin peptides foruse as a medicament. In particular embodiments, the present inventionprovides the use of mini-dystrophin nucleic acid sequences for preparinga drug for a therapeutic application. In additional embodiments, thepresent invention provides the use of mini-dystrophin peptides forpreparing a drug for a therapeutic application. In some embodiments, thepresent invention provides mini-dystrophin nucleic acid sequences forthe preparation of a composition for the treatment of a muscle disease(e.g. DMD). In other embodiments, the present invention providesmini-dystrophin peptides for the preparation of a composition for thetreatment of a muscle disease (e.g. DMD).

DESCRIPTION OF THE FIGURES

[0037]FIG. 1 shows the nucleic acid sequence for wild-type humandystrophin cDNA.

[0038]FIG. 2 shows the nucleic acid sequence for wild-type mousedystrophin cDNA.

[0039]FIG. 3 shows the nucleic acid sequence for wild-type humanutrophin cDNA.

[0040]FIG. 4 shows the nucleic acid sequence for wild-type mouseutrophin cDNA

[0041]FIG. 5 shows various domains of the nucleic acid sequence forwild-type human dystrophin cDNA.

[0042]FIG. 6 shows various domains of the nucleic acid sequence forwild-type human dystrophin cDNA.

[0043]FIG. 7 shows various domains of the nucleic acid sequence forwild-type human dystrophin cDNA.

[0044]FIG. 8 shows various domains of the nucleic acid sequence forwild-type human dystrophin cDNA.

[0045]FIG. 9 shows various domains of the nucleic acid sequence forwild-type human dystrophin cDNA.

[0046]FIG. 10 shows the 3′ UTR domain nucleic acid sequence forwild-type human dystrophin cDNA.

[0047]FIG. 11 shows a sequence alignment between wild-type humandystrophin cDNA and wild-type mouse dystrophin cDNA. The various domainsin the human dystrophin sequence have spaces between them with the endshighlighted in bold. In this regard, homologous sequences for variousdomains in the mouse cDNA sequence are seen.

[0048]FIG. 12 shows the nucleic acid sequence for ΔR4-R23, a nucleicacid sequence encoding a mini-dystrophin peptide.

[0049]FIG. 13 shows the nucleic acid sequence for ΔR2-R21, a nucleicacid sequence encoding a mini-dystrophin peptide.

[0050]FIG. 14 shows the nucleic acid sequence for ΔR2-R21+H3, a nucleicacid sequence encoding a mini-dystrophin peptide.

[0051]FIG. 15 shows the nucleic acid sequence for ΔH2-R19, a nucleicacid sequence encoding a mini-dystrophin peptide.

[0052]FIG. 16 shows the complete cDNA sequence for human skeletal musclealpha actinin.

[0053]FIG. 17 shows the nucleic acid sequence for ΔR9-R16, a nucleicacid sequence encoding a mini-dystrophin peptide.

[0054]FIG. 18 shows the nucleic acid sequence for the WW domain.

[0055]FIG. 19 shows various transgenic expression constructs tested inExample 1.

[0056]FIG. 20 shows the contractile properties of EDL, soleus, anddiaphragm muscles in wild-type, mdx, and dystrophin Δ71-78 mice.

[0057]FIG. 21 show the nucleic acid sequence for pBSX.

[0058]FIG. 22 shows a restriction map for pBSX.

[0059]FIG. 23 shows the ‘full-length’ HDMD sequence.

[0060]FIG. 24 shows the cloning procedure for ΔR4-R23.

[0061]FIG. 25 shows the cloning procedure for ΔR2-R21+H3.

[0062]FIG. 26 shows the cloning procedure for ΔR2-R21.

[0063]FIG. 27 shows a schematic illustration of the domains encoded bythe truncated and full-length dystrophin sequences tested in Example 5.

[0064]FIG. 28 is a graph showing the percentage of myofibers inquadricep muscles of 3 month old mice that display centrally-locatednuclei in the indicated strains of transgenic mice.

[0065]FIG. 29 shows graphs depicting the force generating capacity indiaphragm (A) or EDL (B) muscles of the indicated strains of dystrophintransgenic mdx mice and control mice.

[0066]FIG. 30 shows a graph depicting the force generating capacity inEDL (A) or diaphragm (B) muscles of the indicated strains of dystrophintransgenic mdx mice and control mice.

[0067]FIG. 31 is a graph showing the percentage of force generatingcapacity lost after 1 or 2 lengthening contractions of the tibialisanterior muscle of the indicated strains of dystrophin transgenic mdxmice and control mice.

[0068]FIG. 32 is a graph showing the total distance run on a treadmillby animals from the indicated strains of dystrophin transgenic mdx miceand control mice.

[0069]FIG. 33 shows a graph depicting the total body mass (A) and massof the tibialis anterior muscle (B) of the indicated strains ofdystrophin transgenic mdx mice and control mice.

[0070]FIG. 34 is a schematic illustration of the structure of amini-dystrophin expression cassette inserted into an adeno-associatedviral vector.

[0071]FIG. 35 is a schematic illustration of the structure of plasmidpTZ19R (top) and the sequence of the multiple cloning site in the vector(bottom).

[0072]FIG. 36 shows the nucleic acid sequence of various MCK enhancerregions (wild-type and mutant).

[0073]FIG. 37 shows the nucleic acid sequence of various MCK promoterregions.

[0074]FIG. 38 shows a comparison between domains in dystrophin andutrophin.

DEFINITIONS

[0075] To facilitate an understanding of the present invention, a numberof terms and phrases are defined below:

[0076] As used herein, the term “measurable muscle values” refers tomeasurements of dystrophic symptoms (e.g. fibrosis, an increasedproportion of centrally located nuclei, reduced force generation byskeletal muscle, etc.) in an animal. These measurements may be taken,for example, to determine the wild-type value (i.e. the value in acontrol animal), to determine the value in a DMD (Duchenne musculardystrophy) animal model (e.g. in an mdx mouse model), and to determinethe value in a DMD animal model expressing the mini-dystrophin peptidesof the present invention. Various assays may be employed to determinemeasurable muscle values in an animal including, but not limited to,assays measuring fibrosis, phagocytic infiltration of muscle tissue,variation in myofiber size, an increased proportion of myofibers withcentrally located nuclei, elevated serum levels of muscle pyruvatekinase, contractile properties assays, DAP (dystrophin associatedprotein) assays, susceptibility to contraction induced injuries andmeasured force assays (See Examples 1 and 4).

[0077] As used herein, the term “mini-dystrophin peptide” refers to apeptide that is smaller in size than the full-length wild-typedystrophin peptide, and that is capable of altering (increasing ordecreasing) a measurable muscle value in a DMD animal model by at leastapproximately 10% such that the value is closer to the wild-type value(e.g. a mdx mouse has a measurable muscle value that is 50% of thewild-type value, and this value is increased to at least 60% of thewild-type value; or a mdx mouse has a measurable muscle value that is150% of the wild-type value, and this value is decreased to at most 140%of the wild-type value). In some embodiments, themini-dystrophin-peptide is capable of altering a measurable muscle valuein a DMD animal model by at least approximately 20% of the wild typevalue. In certain embodiments, the mini-dystrophin-peptide is capable ofaltering a measurable muscle value in a DMD animal model by at leastapproximately 30% of the wild type value. In preferred embodiments, themini-dystrophin peptide is capable of altering a measurable muscle valuein a DMD animal model to a level similar to the wild-type value (e.g.±4%).

[0078] As used herein, the term “wild-type dystrophin cysteine-richdomain” refers to a peptide encoded by the nucleic acid sequences in SEQID NO:35 (e.g. in human), as well as wild type peptide homologs encodedby nucleic acid homologs of SEQ ID NO:35 (See, FIG. 11).

[0079] As used herein, the term “wild type dystrophin C-terminal domain”refers to a peptide encoded by the nucleic acid sequences in SEQ IDNO:36 (e.g. in human), as well as wild type peptide homologs encoded bynucleic acid homologs of SEQ ID NO:36 (See, FIG. 11).

[0080] As used herein, the term “mini-dystrophin peptide comprising asubstantially deleted dystrophin C-terminal domain” refers to amini-dystrophin peptide that has less than 45% of a wild type dystrophinC-terminal domain. In some embodiments, the mini-dystrophin peptidecomprises less than 40% of wild type dystrophin C-terminal domain,preferably less than 30%, more preferably less than 20%, even morepreferably less than 1%, and most preferably approximately 0% (e.g. 0,1, 2, 3 or 4 amino acids from the wild type dystrophin C-terminaldomain). The construction of mini-dystrophin peptides with asubstantially deleted dystrophin C-terminal domain may be accomplished,for example, by deleting all or a portion of SEQ ID NO:36 from humandystrophin SEQ ID NO:1 (See, e.g. Example 3C).

[0081] As used herein, the term “wild type dystrophin 5′ untranslatedregion” refers to the nucleic acid sequence at the very 5′ end of a wildtype dystrophin nucleic acid sequence (e.g. SEQ ID NOS:1 and 2) thatimmediately precedes the amino acid coding regions. For example, forhuman dystrophin, SEQ ID NO:5 (the first 208 bases) is the 5′untranslated region (a homolog in mouse may be seen in FIG. 11).

[0082] As used herein, the term “wild type dystrophin 3′ untranslatedregion” refers to the nucleic acid sequence at the very 3′ end of a wildtype dystrophin nucleic acid sequence (e.g. SEQ ID NOS:1 and 2) thatimmediately proceeds the amino acid coding regions. For example, forhuman dystrophin, SEQ ID NO:38 (the last 2690 bases of the humandystrophin gene) is the 3′ untranslated region (a homolog in mouse maybe seen in FIG. 11).

[0083] As used herein, the term “actin-binding domain encoding sequence”refers to the portion of a dystrophin nucleic sequence that encodes apeptide-domain capable of binding actin in vitro (e.g. SEQ ID NO:6), aswell as homologs (See, FIG. 11), conservative mutations, and truncationsof such sequences that encode peptide-domains that are capable ofbinding actin in vivo. Determining whether a particular nucleic acidsequence encodes a peptide-domain (e.g. homolog, mutation, or truncationof SEQ ID NO:6) that will bind actin in vitro may be performed, forexample, by screening the ability of the peptide-domain to bind actin invitro in a simple actin binding assay (See, Corrado et al., FEBSLetters, 344:255-260 [1994], describing the expression of candidatedystrophin peptides as fusion proteins, absorbing F-actin on tomicrotiter plates, incubating the candidate peptides in the F-actincoated microtiter plates, washing the plates, adding anti-fusion proteinrabbit antibody, and adding an anti-rabbit antibody conjugated to adetectable marker).

[0084] As used herein, the term “β-dystroglycan-binding domain encodingsequence” refers to the portion of a dystrophin nucleic sequence thatencodes a peptide-domain capable of binding β-dystroglycan in vivo (e.g.SEQ ID NOs:34 and 35), as well as homologs (See, FIG. 11), conservativemutations, and truncations of such sequences that encode peptide-domainsthat are capable of binding β-dystroglycan in vivo. In preferredembodiments, the β-dystroglycan-binding domain encoding sequenceincludes at least a portion of a hinge 4 encoding region (e.g. SEQ IDNO:45, the WW domain) and at least a portion of a wild-type dystrophincysteine-rich domain (e.g. at least a portion of SEQ ID NO:35) (See,e.g. Jung et al., JBC, 270 (45):27305 [1995]). Determining whether aparticular nucleic acid sequence encodes a peptide-domain (e.g. homolog,mutation, or truncation) that will bind β-dystroglycan in vivo may beperformed, for example, by first screening the ability of thepeptide-domain to bind β-dystroglycan in vitro in a simpleβ-dystroglycan binding assay (See, Jung et al., pg 27306—constructingpeptide-domain dystrophin-GST fusion peptides and radioactively labelledβ-dystroglyean, immobilizing the fusion proteins on glutathione-agarosebeads, incubating the beads with the radioactively labelledβ-dystroglycan, pelleting the beads, washing the beads, and resolvingthe sample on an SDS-polyacrylamide gel, staining with Coomasie blue,exposing to film, and quantifying the amount of radioactivity present).Nucleic acid sequences found to express peptides capable of bindingβ-dystroglycan in such assays may then, for example, be tested in vivoby transfecting a cell line (e.g., COS cells) with two expressionvectors, one expressing the dystroglycan peptide and the otherexpressing the candidate peptide domain (as a fusion protein). Afterculturing the cells, the protein is then extracted and aco-immunoprecipitation is performed for one of the proteins, followed bya Western blot for the other.

[0085] As used herein, the term “spectrin-like repeats” refers topeptides composed of approximately 100 amino acids that are responsiblefor the rod-like shape of many structural proteins including, but notlimited to, dystrophin, utrophin, fodrin, alpha-actin, and spectrin,when the spectrin-like repeats are present in multiple copies (e.g.dystrophin-24, utrophin-22, alpha-actin-4, spectrin-16, etc).Spectrin-like repeats also refers to mutations of these naturalpeptides, such as conservative changes in amino acid sequence, as wellas the addition or deletion of up to 5 amino acids to/from the end of aspectrin-like repeat. Spectrin-like repeats includes ‘precisespectrin-like repeats’ (see below). Examples of spectrin-like repeatsinclude, but are not limited to, peptides encoded by nucleic acidsequences found in wild-type human dystrophin (e.g. SEQ ID NOS:8-10,12-27, and 29-33).

[0086] As used herein, the term “spectrin-like repeat encodingsequences” refers to nucleic acid sequences encoding spectrin-likerepeat peptides. This term includes natural and synthetic nucleic acidsequences encoding the spectrin-like repeats (e.g. both the naturallyoccurring and mutated spectrin-like repeat peptides). Examples ofspectrin-like repeat encoding sequences include, but are not limited to,SEQ ID NOS:8-10, 12-27, and 29-33.

[0087] As used herein, the term “precise spectrin-like repeat encodingsequences” refers to nucleic acid sequences encoding spectrin-likerepeat peptides with up to 1 additional amino acid added to, or deletedfrom, the spectrin-like repeat.

[0088] As used herein, the term “spectrin-like repeat domain” refers tothe region in a mini-dystrophin peptide that contains the spectrin-likerepeats of the mini-dystrophin peptide.

[0089] The term “gene” refers to a DNA sequence that comprises controland coding sequences necessary for the production of a polypeptide orprecursor thereof. The polypeptide can be encoded by a full lengthcoding sequence or by any portion of the coding sequence so long as thedesired enzymatic activity is retained. The term “gene” encompasses bothcDNA and genomic forms of a given gene.

[0090] The tern “wild-type” refers to a gene, gene product, or othersequence that has the characteristics of that gene or gene product whenisolated from a naturally occurring source. A wild-type gene is thatwhich is most frequently observed in a population and is thusarbitrarily designated the “normal” or “wild-type” form of the gene. Incontrast, the term “modified” or “mutant” refers to a gene, geneproduct, or other sequence that displays modifications in sequence andor functional properties (e.g. altered characteristics) when compared tothe wild-type gene or gene product. It is noted that naturally-occurringmutants can be isolated; these are identified by the fact that they havealtered characteristics when compared to the wild-type gene or geneproduct.

[0091] The term “oligonucleotide” as used herein is defined as amolecule comprised of two or more deoxyribonucleotides orribonucleotide, usually more than three (3), and typically more than ten(10) and up to one hundred (100) or more (although preferably betweentwenty and thirty). The exact size will depend on many factors, which inturn depends on the ultimate function or use of the oligonucleotide. Theoligonucleotide may be generated in any manner, including chemicalsynthesis, DNA replication, reverse transcription, or a combinationthereof.

[0092] As used herein, the term “regulatory sequence” refers to agenetic sequence or element that controls some aspect of the expressionof nucleic acid sequences. For example, a promoter is a regulatoryelement that facilitates the initiation of transcription of an operablylinked coding region. Other regulatory elements are enhancers, splicingsignals, polyadenylation signals, termination signals, etc. Examplesinclude, but are not limited to, the 5′ UTR of the dystrophin gene (SEQID NO:5), MCK promoters and enhancers (both wild type and mutant, SeeU.S. provisional app. Ser No. 60/218,436, filed Jul. 14, 2000, andInternational Application PCT/US01/22092, filed Jul. 13, 2001, both ofwhich are hereby incorporated by reference).

[0093] Transcriptional control signals in eucaryotes comprise “promoter”and “enhancer” elements. Promoters and enhancers consist of short arraysof DNA sequences that interact specifically with cellular proteinsinvolved in transcription. The present invention contemplates modifiedenhancer regions.

[0094] The term “recombinant DNA vector” as used herein refers to DNAsequences containing a desired coding sequence and appropriate DNAsequences necessary for the expression of the operably linked codingsequence in a particular host organism (e.g., mammal). DNA sequencesnecessary for expression in procaryotes include a promoter, optionallyan operator sequence, a ribosome binding site and possibly othersequences. Eukaryotic cells are known to utilize promoters,polyadenlyation signals and enhancers.

[0095] The terms “in operable combination”, “in operable order” and“operably linked” as used herein refer to the linkage of nucleic acidsequences in such a manner that a nucleic acid molecule capable ofdirecting the transcription of a given gene and/or the synthesis of adesired protein molecule is produced. The term also refers to thelinkage of amino acid sequences in such a manner so that a functionalprotein is produced.

[0096] “Hybridization” methods involve the annealing of a complementarysequence to the target nucleic acid (the sequence to be detected). Theability of two polymers of nucleic acid containing complementarysequences to find each other and anneal through base pairing interactionis a well-recognized phenomenon.

[0097] The “complement” of a nucleic acid sequence as used herein refersto an oligonucleotide which, when aligned with the nucleic acid sequencesuch that the 5′ end of one sequence is paired with the 3′ end of theother, is in “antiparallel association.” Complementarity need not beperfect; stable duplexes may contain mismatched base pairs or unmatchedbases. Those skilled in the art of nucleic acid technology can determineduplex stability empirically considering a number of variablesincluding, for example, the length of the oligonucleotide, basecomposition and sequence of the oligonucleotide, ionic strength andincidence of mismatched base pairs.

[0098] The term “homology” refers to a degree of complementarity. Theremay be partial homology or complete homology (i.e., identity). Apartially complementary sequence is one that at least partially inhibitsa completely complementary sequence from hybridizing to a target nucleicacid is referred to using the functional term “substantiallyhomologous.” The inhibition of hybridization of the completelycomplementary sequence to the target sequence may be examined using ahybridization assay (Southern or Northern blot, solution hybridizationand the like) under conditions of low stringency. A substantiallyhomologous sequence or probe will compete for and inhibit the binding(i.e., the hybridization) of a completely homologous to a target underconditions of low stringency. This is not to say that conditions of lowstringency are such that non-specific binding is permitted; lowstringency conditions require that the binding of two sequences to oneanother be a specific (ie., selective) interaction. The absence ofnon-specific binding may be tested by the use of a second target thatlacks even a partial degree of complementarity (e.g., less than about30% identity); in the absence of non-specific binding the probe will nothybridize to the second non-complementary target.

[0099] As used herein the term “stringency” is used in reference to theconditions of temperature, ionic strength, and the presence of othercompounds such as organic solvents, under which nucleic acidhybridizations are conducted. Those skilled in the art will recognizethat “stringency” conditions may be altered by varying the parametersjust described either individually or in concert. With “high stringency”conditions, nucleic acid base pairing will occur only between nucleicacid fragments that have a high frequency of complementary basesequences (e.g., hybridization under “high stringency” conditions mayoccur between homologs with about 85-100% identity, preferably about70-100% identity). With medium stringency conditions, nucleic acid basepairing will occur between nucleic acids with an intermediate frequencyof complementary base sequences (e.g., hybridization under “mediumstringency” conditions may occur between homologs with about 50-70%identity). Thus, conditions of “weak” or “low” stringency are oftenrequired with nucleic acids that are derived from organisms that aregenetically diverse, as the frequency of complementary sequences isusually less.

[0100] Low stringency conditions when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 42° C. in a solution consisting of 5× SSPE (43.8 g/l NaCL, 6.9 g/lNaH₂PO₄—H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDA,5× Denhardt's reagent [50× Denhardt's contains per 500 ml: 5 g Ficoll(Type 400, Pharmacia), 5 g BSA (Fraction V, Sigma)] and 100 μg/mldenatured salmon sperm DNA followed by washing in solution comprising 5×SSPE, 0.1% SDS at 42° C. when a probe of about 500 nucleotides in lengthis employed.

[0101] High stringency conditions when used in reference to nucleic acidhybridization comprises conditions equivalent to binding or hybridizingat 42° C. in a solution consisting of 5× SSPE (43.8 g/l NaCL, 6.9 g/lNaH₂PO₄—H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS,5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA, followedby washing in a solution comprising 0.1× SSPE, 1.0% SDS at 42° C. when aprobe of about 500 nucleotides is employed.

[0102] The art knows well that numerous equivalent conditions may beemployed to comprise low stringency conditions; factors such as thelength and nature (DNA, RNA, base composition) of the probe and natureof the target (DNA, RNA, base composition, present in solution orimmobilized, etc.) and the concentration of the salts and othercomponents (e.g., the presence or absence of formamide, dextran sulfate,polyethylene glycol) are considered and the hybridization solution maybe varied to generate conditions of low stringency hybridizationdifferent from, but equivalent to, the above listed conditions. Inaddition, the art knows conditions that promote hybridization underconditions of high stringency (e.g., increasing the temperature of thehybridization and/or wash steps, the use of formamide in thehybridization solution, etc.).

[0103] The term “transfection” as used herein refers to the introductionof foreign DNA into eukaryotic cells. Transfection may be accomplishedby a variety of means known to the art including calcium phosphate-DNAco-precipitation, DEAE-dextran-mediated transfection, polybrene-mediatedtransfection, electroporation, microinjection, liposome fusion,lipofection, protoplast fusion, retroviral infection, and biolistics.

[0104] The term “stable transfection” or “stably transfected” refers tothe introduction and integration of foreign DNA into the genome of thetransfected cell. The term “stable transfectant” refers to a cell whichhas stably integrated foreign DNA into the genomic DNA.

[0105] As used herein, the terms “nucleic acid molecule encoding,” “DNAsequence encoding,” and “DNA encoding” refer to the order or sequence ofdeoxyribonucleotides along a strand of deoxyribonucleic acid. The orderof these deoxyribonucleotides determines the order of amino acids alongthe polypeptide (protein) chain. The DNA sequence thus codes for theamino acid sequence.

[0106] As used herein, the terms “muscle cell” refers to a cell derivedfrom muscle tissue, including, but not limited to, cells derived fromskeletal muscle, smooth muscle (e.g. from the digestive tract, urinarybladder, and blood vessels), and cardiac muscle. The term includesmuscle cells in vitro, ex vivo, and in vivo. Thus, for example, anisolated cardiomyocyte would constitute a muscle cell, as would a cellas it exists in muscle tissue present in a subject in vivo. This termalso encompasses both terminally differentiated and nondifferentiatedmuscle cells, such as myocytes, myotubes, myoblasts, cardiomyocytes, andcardiomyoblasts.

[0107] As used herein, the term “muscle-specific” in reference to anregulatory element (e.g. enhancer region, promoter region) means thatthe transcriptional activity driven by these regions is mostly in musclecells or tissue (e.g. 20:1) compared to the activity conferred by theregulatory sequences in other tissues. An assay to determine themuscle-specificity of a regulatory region is provided in Example 5 below(measuring beta-galactoside in muscle cells and liver cells from a mousetransfected with an expression vector).

[0108] As used herein, the term “mutant muscle-specific enhancer region”refers to a wild-type muscle-specific enhancer region that has beenmodified (e.g. deletion, insertion, addition, substitution), and inparticular, has been modified to contain an additional MCK-R controlelement (See U.S. Prov. App. Ser. No. 60/218,436, hereby incorporated byreference, and section IV below).

DESCRIPTION OF THE INVENTION

[0109] The present invention provides compositions and methods forexpressing mini-dystrophin peptides. In particular, the presentinvention provides compositions comprising nucleic acid sequences thatare shorter than wild-type dystrophin cDNA and that expressmini-dystrophin peptides that function in a similar manner as wild-typedystrophin proteins. The present invention also provides compositionscomprising mini-dystrophin peptides, and methods for expressingmini-dystrophin peptides in target cells.

[0110] The present invention provides such shortened nucleic acidsequences (and resulting peptides) in a variety of ways. For example,the present invention provides nucleic acid encoding only 4, 8, 12, 16,and 20 spectrin-like repeat encoding sequences (i.e. nucleic acidencoding an exact number of spectrin-like repeats that are multiples of4). As wild-type dystrophin has 24 spectrin-like repeat encodingsequences, providing nucleic acid encoding fewer numbers of repeatsreduces the size of the dystrophin gene (e.g. allowing the nucleic acidsequence to fit into vectors with limited cloning capacity). Anotherexample of such shortened nucleic acid sequences are those that lack atleast a portion of the carboxy-terminal domain of wild-type dystrophinnucleic acid. A further example of such shortened nucleic acid sequencesare those that lack at least a portion of the 3′ untranslated region, or5′ untranslated region, or both.

[0111] I. Dystrophin

[0112] A. Dystrophin Structure

[0113] In some embodiments, the present invention provides geneconstructs comprising spectrin-like repeats from human dystrophin.Dystrophin is a 427 kDa cytoskeletal protein and is a member of thespectrin/αactinin superfamily (See e.g., Blake et al., Brain Pathology,6:37 [1996]; Winder, J. Muscle Res. Cell. Motil., 18:617 [1997]; andTinsley el al., PNAS, 91:8307 [1994]). The N-terminus of dystrophinbinds to actin, having a higher affinity for non-muscle actin than forsarcomeric actin. Dystrophin is involved in the submembraneous networkof non-muscle actin underlying the plasma membrane. Dystrophin isassociated with an oligomeric, membrane spanning complex of proteins andglycoproteins, the dystrophin-associated protein complex (DPC). TheN-terminus of dystrophin has been shown in vitro to contain a functionalactin-binding domain. The C-terminus of dystrophin binds to thecytoplasmic tail of β-dystroglycan, and in concert with actin, anchorsdystrophin to the sarcolemma. Also bound to the C-terminus of dystrophinare the cytoplasmic members of the DPC. Dystrophin thereby provides alink between the actin-based cytoskeleton of the muscle fiber and theextracellular matrix. It is this link that is disrupted in musculardystrophy.

[0114] The central rod domain of dystrophin is composed of a series of24 weakly repeating units of approximately 110 amino acids, similar tothose found in spectrin (i.e., spectrin-like repeats). This domainconstitutes the majority of dystrophin and gives dystrophin a flexiblerod-like structure. The rod-domain is interrupted by four hinge regionsthat are rich in proline. It is contemplated that the rod-domainprovides a structural link between member of the DPC. Table 1 shows anoverview of the structural and functional domains of human dystrophin.TABLE 1 Full Length Human Dystrophin cDNA Nucleotides Feature SEQ ID NO: 1-208 5′ intranslated region SEQ ID NO:5  209-211 Start codon (ATC) —209-964 N terminus SEQ ID NO:6   965-1219 Hinge 1 SEQ ID NO:7  1220-1546Spectrin-like repeat No. 1 SEQ ID NO:8  1547-1879 Spectrin-like repeatNo. 2 SEQ ID NO:9  1880-2212 Spectrin-like repeat No. 3 SEQ ID NO:102213-2359 Hinge 2 SEQ ID NO:11 2360-2692 Spectrin-like repeat No. 4 SEQID NO:12 2693-3019 Spectrin-like repeat No. 5 SEQ ID NO:13 3020-3346Spectrin-like repeat No. 6 SEQ ID NO:14 3347-3673 Spectrin-like repeatNo. 7 SEQ ID NO:15 3674-4000 Spectrin-like repeat No. 8 SEQ ID NO:164001-4312 Spectrin-like repeat No. 9 SEQ ID NO:17 4313-4588Spectrin-like repeat No. 10 SEQ ID NO:18 4589-4915 Spectrin-like repeatNo. 11 SEQ ID NO:19 4916-5239 Spectrin-like repeat No. 12 SEQ ID NO:205340-5551 Spectrin-like repeat No. 13 SEQ ID NO:21 5552-5833Spectrin-like repeat No. 14 SEQ ID NO:22 5834-6127 Spectrin-like repeatNo. 15 SEQ ID NO:23 6128-6187 20 amino acid insert (not hinge) —6188-6514 Spectrin-like repeat No. 16 SEQ ID NO:24 6515-6835Spectrin-like repeat No. 17 SEQ ID NO:25 6836-7186 Spectrin-like repeatNo. 18 SEQ ID NO:26 7187-7489 Spectrin-like repeat No. 19 SEQ ID NO:277490-7612 Hinge 3 SEQ ID NO:28 7613-7942 Spectrin-like repeat No. 20 SEQID NO:29 7943-8269 Spectrin-like repeat No. 21 SEQ ID NO:30 8270-8617Spectrin-like repeat No. 22 SEQ ID NO:31 8618-9004 Spectrin-like repeatNo. 23 SEQ ID NO:32 9005-9328 Spectrin-like repeat No. 24 SEQ ID NO:339329-9544 Hinge 4 SEQ ID NO:34  9545-10431 Start of C terminus SEQ IDNO:35 10432-11254 Alternatively spliced exons 71-78 SEQ ID NO:3611255-11266 End of Coding Region SEQ ID NO:37 11267-13957 3′intranslated region SEQ ID NO:38

[0115] B. Spectrin-Like Repeats

[0116] Spectrin-like repeats are about 100 amino acids long and arefound in a number of proteins, including the actin binding proteinsspectrin, fodrin, α-actinin, and dystrophin, but their function remainsunclear (Dhermy, 1991. Biol. Cell, 71:249-254). These domains may beinvolved in connecting functional domains and/or mediate protein-proteininteractions. The many tandem, spectrin-like motifs that comprise mostof the mass of the proteins in this superfamily are responsible fortheir similar flexible, rod-like molecular shapes. Although thesehomologous motifs are frequently called repeats or repetitive segments,adjacent segments in each protein are only distantly relatedevolutionarily.

[0117] Spectrin is a cytoskeletal protein of red blood cells that isassociated with the cytoplasmic side of the lipid bilayer (See e.g.,Speicher and Ursitti, Current Biology, 4:154 [1994]). Spectrin is along-thin flexible rod-shaped protein that constitutes about 25% of themembrane-associated protein mass. Spectrin is composed of two largepolypeptide chains, α-spectrin (˜240 kDa) and β-spectrin (˜220 kDa) andserves to cross-link short actin oligomers to form a dynamictwo-dimensional submembrane latticework. Spectrin isoforms have beenfound in numerous cell types and have been implicated in a variety offunctions.

[0118] The recent determination of the crystal structure of a singledomain of spectrin provides insight into the structure function of anentire class of large actin cross-linking proteins (Yan et al., Science,262:2027 [1993]). The domain is an example of a spectrin-like repeat.Early analysis of spectrin-like repeats by partial peptide sequenceanalysis demonstrated that most of the antiparallel spectrin heterodimeris made up of homologous 106 residue motifs. Subsequent sequenceanalyses of cDNAs confirmed that this small motif is the major buildingblock for all spectrin isoforms, as well as for the related actinins anddystrophins (Matsudaira, Trends Biochem Sci, 16:87 [1991]).

[0119] Given their similar sequences, all spectrin motifs are expectedto have related, but not identical, three-dimensional structures. Thestructure of a single Drosophila spectrin motif, 14, which has now beendetermined (Yan et al., Science, 262:2027 [1993]), should thereforeprovide insight into the overall conformation of spectrins in particularand, to a more limited extent, the other members of the spectrinsuperfamily. The structure shows that the spectrin motif forms athree-helix bundle, similar to the earliest conformational predictionbased on the analysis of multiple homologous motifs (Speicher andMarchesi, Nature, 311:177 [1984]).

[0120] II. Variants and Homologs of Dystrophin

[0121] The present invention is not limited to the spectrin-like repeatencoding sequences SEQ ID NOS:8-10, 12-27, and 29-33, but specificallyincludes nucleic acid sequences capable of hybridizing to thespectrin-like repeat encoding sequences SEQ ID NOS:8-10, 12-27, and29-33, (e.g. capable of hybridizing under high stringent conditions).Those skilled in the art know that different hybridization stringenciesmay be desirable. For example, whereas higher stringencies may bepreferred to reduce or eliminate non-specific binding between thespectrin-like repeat encoding sequences SEQ ID NOS:8-10, 12-27, and29-33, and other nucleic acid sequences, lower stringencies may bepreferred to detect a larger number of nucleic acid sequences havingdifferent homologies to the nucleotide sequence of SEQ ID NOS:8-10,12-27, and 29-33.

[0122] Accordingly, in some embodiments, the dystrophin spectrin-likerepeats of the compositions of the present invention (e.g., SEQ IDNOs:8-10, 12-27, and 29-33) are replaced with different spectrin-likerepeats, including, but not limited to, variants, homologs, truncations,and additions of dystrophin spectrin-like repeats. Candidatespectrin-like repeats are screened for activity using any suitableassay, including, but not limited to, those described below and inillustrative Examples 1 and 5.

[0123] A. Homologs

[0124] 1. Dystrophin From other Species

[0125] In some embodiments, the spectrin-like repeats of the geneconstructs of the present invention are replaced with spectrin-likerepeats of dystrophin from other species (e.g., homologs of dystrophin),including, but not limited to, those described herein. Homologs ofdystrophin have been identified in a variety of organisms, includingmouse (Genbank accession number M68859); dog (Genbank accession numberAF070485); and chicken (Genbank accession number X13369). Thespectrin-like repeats of the mouse dystrophin gene were compared to thehuman gene (See FIG. 11) and were shown to have significant homology.Similar comparisons can be generated with homologs from other species,including but not limited to, those described above, by using a varietyof available computer programs (e.g., BLAST, from NCBI). Candidatehomologs can be screened for biological activity using any suitableassay, including, but not limited to, those described herein.

[0126] 2. Utrophin

[0127] In some embodiments, the spectrin-like repeats of the geneconstructs of the present invention are replaced with spectrin-likerepeats from another peptide (e.g., homologs of dystrophin). Forexample, in some embodiments, spectrin-like repeats from the utrophinprotein (See e.g., Genbank accession number X69086; SEQ ID NO:3; FIG. 3)are utilized. Utrophin is an autosomally-encoded homolog of dystrophinand has been postulated that the proteins play a similar physiologicalrole (For a recent review, See e.g., Blake et al., Brain Pathology, 6:37[1996]). Human utrophin shows substantial homology to dystrophin, withthe major difference occurring in the rod domain, where utrophin lacksrepeats 15 and 19 and two hinge regions (See e.g., Love et al., Nature339:55 [1989]; Winder et al., FEBS Lett., 369:27 [1995]). Utrophin thuscontains 22 spectrin-like repeats and two hinge regions. A comparison ofthe rod domain of Utrophin and Dystrophin is shown in FIG. 38.

[0128] In addition, in some embodiments, spectrin-like repeats from ahomolog of utrophin are utilized. Homologs of utrophin have beenidentified in a variety of organisms, including mouse (Genbank accessionnumber Y12229; SEQ ID NO:4; FIG. 4) and rat (Genbank accession numberAJ002967). The nucleic acid sequence of these or additional homologs canbe compared to the nucleic acid sequence of human utrophin using anysuitable methods, including, but not limited to, those described above.Candidate spectrin-like repeats from human utrophin or utrophin homologscan be screened for biological activity using any suitable assay,including, but not limited to, those described herein.

[0129] 3. Alpha-actinin

[0130] In some embodiments, spectrin-like repeats from Dystrophin arereplaced with spectrin-like repeats from alpha-actinin. Themicrofilament protein alpha-actinin exists as a dimer. The N-terminalregions of both polypeptides, arranged in antiparallel orientation,comprise the actin-binding regions, while the C-terminal, larger partsconsist of four spectrin-like repeats that interact to form a rod-likestructure (See e.g., Winkler et al., Eur. J. Biochem., 248:193 [1997]).In some embodiments, human alpha-actinin spectrin-like repeats areutilized (Genbank accession number M86406; SEQ ID NO:87; FIG. 16). Inother embodiments, alpha-actinin homologs from other organisms areutilized (e.g., mouse (Genbank accession number AJ289242); Xenopus(Genbank accession number BE576799); and rat (Genbank accession numberAF190909).

[0131] B. Variants

[0132] Still other embodiments of the present invention provide mutantor variant forms of spectrin-like repeats (ie., muteins). It is possibleto modify the structure of a peptide having an activity of spectrin-likerepeats for such purposes as enhancing therapeutic or prophylacticefficacy, or stability (e.g., ex vivo shelf life, and/or resistance toproteolytic degradation in vivo). Such modified peptides provideadditional peptides having a desired activity of the subjectspectrin-like repeats as defined herein. A modified peptide can beproduced in which the amino acid sequence has been altered, such as byamino acid substitution, deletion, or addition.

[0133] Moreover, as described above, variant forms (e.g., mutants) ofthe subject spectrin-like repeats are also contemplated as finding usein the present invention. For example, it is contemplated that anisolated replacement of a leucine with an isoleucine or valine, anaspartate with a glutamate, a threonine with a serine, or a similarreplacement of an amino acid with a structurally related amino acid(i.e., conservative mutations) will not have a major effect on thebiological activity of the resulting molecule. Accordingly, someembodiments of the present invention provide variants of spectrin-likerepeats containing conservative replacements. Conservative replacementsare those that take place within a family of amino acids that arerelated in their side chains. Genetically encoded amino acids can bedivided into four families: (1) acidic (aspartate, glutamate); (2) basic(lysine, arginine, histidine); (3) nonpolar (alanine, valine, leucine,isoleucine, proline, phenylalanine, methionine, tryptophan); and (4)uncharged polar (glycine, asparagine, glutamine, cysteine, serine,threonine, tyrosine). Phenylalanine, tryptophan, and tyrosine aresometimes classified jointly as aromatic amino acids. In similarfashion, the amino acid repertoire can be grouped as (1) acidic(aspartate, glutamate); (2) basic (lysine, arginine histidine), (3)aliphatic (glycine, alanine, valine, leucine, isoleucine, serine,threonine), with serine and threonine optionally be grouped separatelyas aliphatic-hydroxyl; (4) aromatic (phenylalanine, tyrosine,tryptophan); (5) amide (asparagine, glutamine); and (6)sulfur-containing (cysteine and methionine) (See e.g., Stryer (ed.),Biochemistry, 2nd ed, W H Freeman and Co. [1981]). Whether a change inthe amino acid sequence of a peptide results in a functional homolog canbe readily determined by assessing the ability of the variant peptide tofunction in a fashion similar to the wild-type protein. Peptides inwhich more than one replacement has taken place can readily be tested inthe same manner.

[0134] The present invention further contemplates a method of generatingsets of combinatorial mutants of the present spectrin-like repeats, aswell as truncation mutants, and is especially useful for identifyingpotential variant sequences (i.e., homologs) that possess the biologicalactivity of spectrin-like repeats (e.g., a decrease in muscle necrosis).In addition, screening such combinatorial libraries is used to generate,for example, novel spectrin-like repeat homologs that possess novelbiological activities all together.

[0135] Therefore, in some embodiments of the present invention,spectrin-like repeat homologs are engineered by the present method toproduce homologs with enhanced biological activity. In other embodimentsof the present invention, combinatorially-derived homologs are generatedwhich provide spectrin-like repeats that are easier to express andtransfer to host cells. Such spectrin-like repeats, when expressed fromrecombinant DNA constructs, can be used in therapeutic embodiments ofthe invention described below.

[0136] Still other embodiments of the present invention providespectrin-like repeat homologs which have intracellular half-livesdramatically different than the corresponding wild-type protein. Forexample, the altered proteins comprising the spectrin-like repeathomologs are rendered either more stable or less stable to proteolyticdegradation or other cellular process that result in destruction of, orotherwise inactivate spectrin-like repeats. Such homologs, and the genesthat encode them, can be utilized to alter the pharmaceutical activityof constructs expressing spectrin-like repeats by modulating thehalf-life of the protein. For instance, a short half-life can give riseto more transient biological effects. As above, such proteins find usein pharmaceutical applications of the present invention.

[0137] In some embodiments of the combinatorial mutagenesis approach ofthe present invention, the amino acid sequences for a population ofspectrin-like repeat homologs are aligned, preferably to promote thehighest homology possible. Such a population of variants can include,for example, spectrin-like repeat homologs from one or more species, orspectrin-like repeat homologs from different proteins of the samespecies (e.g., including, but not limited to, those described above).Amino acids that appear at each position of the aligned sequences areselected to create a degenerate set of combinatorial sequences.

[0138] In a preferred embodiment of the present invention, thecombinatorial spectrin-like repeat library is produced by way of adegenerate library of genes encoding a library of polypeptides that eachinclude at least a portion of candidate spectrin-like repeat sequences.For example, a mixture of synthetic oligonucleotides is enzymaticallyligated into gene sequences such that the degenerate set of candidatespectrin-like repeat sequences are expressible as individualpolypeptides, or alternatively, as a set of larger fusion proteins(e.g., for phage display) containing the set of spectrin-like repeatsequences therein.

[0139] There are many ways by which the library of potentialspectrin-like repeat homologs can be generated from a degenerateoligonucleotide sequence. In some embodiments, chemical synthesis of adegenerate gene sequence is carried out in an automatic DNA synthesizer,and the synthetic genes are ligated into an appropriate gene forexpression. The purpose of a degenerate set of genes is to provide, inone mixture, all of the sequences encoding the desired set of potentialspectrin-like repeat sequences. The synthesis of degenerateoligonucleotides is well known in the art (See e.g., Narang, TetrahedronLett., 39:3 9 [1983]; Itakura el al., Recombinant DNA, in Walton (ed.),Proceedings of the 3rd Cleveland Symposium on Macromolecules, Elsevier,Amsterdam, pp 273-289 [1981]; Itakura et al., Annu. Rev. Biochem.,53:323 [1984]; Itakura et al., Science 198:1056 [1984]; Ike et al. Nucl.Acid Res., 11:477 [1983]). Such techniques have been employed in thedirected evolution of other proteins (See e.g., Scott et al., Science,249:386-390 [1980]; Roberts et al., Proc. Natl. Acad. Sci. USA,89:2429-2433 [1992]; Devlin et al., Science, 249: 404-406 [1990]; Cwirlaet al., Proc. Natl. Acad. Sci. USA, 87: 6378-6382 [1990]; as well asU.S. Pat. Nos. 5,223,409, 5,198,346, and 5,096,815, each of which isincorporated herein by reference).

[0140] A wide range of techniques are known in the art for screeninggene products of combinatorial libraries made by point mutations, andfor screening cDNA libraries for gene products having a certainproperty. Such techniques are generally adaptable for rapid screening ofthe gene libraries generated by the combinatorial mutagenesis ofspectrin-like repeat homologs. The most widely used techniques forscreening large gene libraries typically comprise cloning the genelibrary into replicable expression vectors, transforming appropriatecells with the resulting library of vectors, and expressing thecombinatorial genes under conditions in which detection of a desiredactivity facilitates relatively easy isolation of the vector encodingthe gene whose product was detected. Each of the illustrative assaysdescribed below are amenable to high through-put analysis as necessaryto screen large numbers of degenerate sequences created by combinatorialmutagenesis techniques.

[0141] Accordingly, in one embodiment of the present invention, thecandidate genes comprising altered spectrin-like repeats are displayedon the surface of a cell or viral particle, and the ability ofparticular cells or viral particles to bind to a another member of theDPC complex (e.g., actin) is assayed. In other embodiments of thepresent invention, the gene library is cloned into the gene for asurface membrane protein of a bacterial cell, and the resulting fusionprotein detected by panning (WO 88/06630; Fuchs et al., BioTechnol.,9:1370 [1991]; and Goward et al., TIBS 18:136 [1992]). In otherembodiments of the present invention, fluorescently labeled moleculesthat bind proteins comprising spectrin like repeats (e.g., actin), canbe used to score for potentially functional spectrin-like repeathomologs. Cells are visually inspected and separated under afluorescence microscope, or, where the morphology of the cell permits,separated by a fluorescence-activated cell sorter.

[0142] In an alternate embodiment of the present invention, the genelibrary is expressed as a fusion protein on the surface of a viralparticle. For example, foreign peptide sequences are expressed on thesurface of infectious phage in the filamentous phage system, therebyconferring two significant benefits. First, since these phage can beapplied to affinity matrices at very high concentrations, a large numberof phage can be screened at one time. Second, since each infectiousphage displays the combinatorial gene product on its surface, if aparticular phage is recovered from an affinity matrix in low yield, thephage can be amplified by another round of infection. The group ofalmost identical E. coli filamentous phages M13, fd, and fl are mostoften used in phage display libraries, as either of the phage gIII orgVIII coat proteins can be used to generate fusion proteins withoutdisrupting the ultimate packaging of the viral particle (See e.g., WO90/02909; WO 92/09690; Marks et al., J. Biol. Chem., 267:16007 [1992];Griffths et al., EMBO J., 12:725 [1993]; Clackson et al., Nature,352:624 [1991]; and Barbas et al., Proc. Natl. Acad. Sci., 89:4457[1992]).

[0143] In another embodiment of the present invention, the recombinantphage antibody system (e.g., RPAS, Pharmacia Catalog number 27-9400-01)is modified for use in expressing and screening of spectrin-like repeatcombinatorial libraries. The pCANTAB 5 phagemid of the RPAS kit containsthe gene that encodes the phage gill coat protein. In some embodimentsof the present invention, the spectrin-like repeat combinatorial genelibrary is cloned into the phagemid adjacent to the gill signal sequencesuch that it is expressed as a gill fusion protein. In other embodimentsof the present invention, the phagemid is used to transform competent E.coli TG1 cells after ligation. In still other embodiments of the presentinvention, transformed cells are subsequently infected with M13KO7helper phage to rescue the phagemid and its candidate spectrin-likerepeat gene insert. The resulting recombinant phage contain phagemid DNAencoding a specific candidate spectrin-like repeat and display one ormore copies of the corresponding fusion coat protein. In someembodiments of the present invention, the phage-displayed candidateproteins that are capable of, for example, binding to actin, areselected or enriched by panning. The bound phage is then isolated, andif the recombinant phage express at least one copy of the wild type gillcoat protein, they will retain their ability to infect E. coli. Thus,successive rounds of reinfection of E. coli and panning will greatlyenrich for spectrin-like repeat homologs, which can then be screened forfurther biological activities.

[0144] In light of the present disclosure, other forms of mutagenesisgenerally applicable will be apparent to those skilled in the art inaddition to the aforementioned rational mutagenesis based on conservedversus non-conserved residues. For example, spectrin-like repeathomologs can be generated and screened using, for example, alaninescanning mutagenesis and the like (Ruf et al., Biochem., 33:1565 [1994];Wang et al., J. Biol. Chem., 269:3095 [1994]; Balint et al. Gene 137:109[1993]; Grodberg et al., Eur. J. Biochem., 218:597 [1993]; Nagashima etal., J. Biol. Chem., 268:2888 [1993]; Lowman et al., Biochem., 30:10832[1991]; and Cunningham et al., Science, 244:1081 [1989]), by linkerscanning mutagenesis (Gustin et al., Virol., 193:653 [1993]; Brown etal., Mol. Cell. Biol., 12:2644 [1992]; McKnight et al., Science,232:316); or by saturation mutagenesis (Meyers et al., Science, 232:613[1986]).

[0145] C. Truncations and Additions

[0146] In yet other embodiments of the present invention, thespectrin-like repeats of human dystrophin are replaced by truncation oradditions of spectrin-like repeats from dystrophin or another protein,including, but not limited to, those described above. Accordingly, insome embodiments, amino acids are truncated from either end of one ormore of the spectrin-like repeats in a given construct. The activity oftruncation mutants is determined using any suitable assay, including,but not limited to, those disclosed herein.

[0147] In some embodiments, additional amino acids are added to eitheror both ends of the spectrin-like repeats in a given construct. In someembodiments, single amino acids are added and the activity of theconstruct is determined. Amino acids may be added to one or more of thespectrin-like repeats in a given construct. The activity ofspectrin-like repeats comprising additional amino acids is determinedusing any suitable assay, including, but not limited to, those disclosedherein.

[0148] III. Carboxy-Terminal Domain Truncated Dystrophin Genes

[0149] In some embodiments, the present invention provides compositionscomprising nucleic acid, wherein the nucleic acid encodes amini-dystrophin peptide, and wherein the mini-dystrophin peptidecomprises a substantially deleted dystrophin C-terminal domain (e.g.,55% of the dystrophin C-terminal domain is missing). In someembodiments, this type of truncation prevents the mini-dystrophinpeptide from binding both syntrophin and dystrobrevin.

[0150] The dystrophin COOH-terminal domain is located adjacent to thecysteine-rich domain, and contains an alternatively spliced region andtwo coiled-coil motifs (Blake et al., Trends Biochem. Sci., 20:133,1995). The alternatively spliced region binds three isoforms ofsyntrophin in muscle, while the coiled-coil motifs bind numerous membersof the dystrobrevin family (Sadoulet-Puccio et al., PNAS, 94:12413,1997). The dystrobrevins display significant homology with theCOOH-terminal region of dystrophin, and the larger dystrobrevin isoformsalso bind to the syntrophins. The importance and functional significanceof syntrophin and dystrobrevin remains largely unknown, although theymay be involved in cell signaling pathways (Grady et al., Nat. Cell.Biol, 1:215, 1999).

[0151] Researchers have previously generated transgenic mdx mousestrains expressing dystrophins deleted for either the syntrophin or thedystrobrevin binding domain (Rafael et al. Hum. Mol. Genet., 3:1725,1994; and Rafael et al. J. Cell Biol., 134:93 1996). These micedisplayed normal muscle function and essentially normal localization ofsyntrophin, dystrobrevin, and nNOS. Thus, while dystrobrevin appears toprotect muscle from damage (Grady et al., Nat. Cell. Biol, 1:215, 1999),removal of the dystrobrevin binding site from dystrophin does not resultin a dystrophy. Subsequent studies revealed that syntrophin anddystrobrevin bind each other in addition to dystrophin, so that removalof only one of the two binding sites on dystrophin might not sever thelink between dystrophin, syntrophin and dystrobrevin. Surprisingly, thetransgenic mice according to the present invention (See Example 1)displayed normal muscle function even though they lacked both thesyntrophin and dystrobrevin binding sites.

[0152] IV. MCK Regulatory Regions

[0153] In certain embodiments, nucleic acid encoding mini-dystrophinpeptides of the present invention are operably linked to muscle creatinekinase gene (MCK) regulatory regions and control elements, as well asmutated from of these regions and elements (see See U.S. ProvisionalApp. Ser No. 60/218,436, filed Jul. 14, 2000, and InternationalApplication PCT/US01/22092, filed Jul. 13, 2001, both of which arehereby incorporated by reference). In some embodiments, the nucleic acidencoding mini-dystrophin peptides is operably linked to these sequencesto provide muscle specificity and reduced size such that the resultingconstruct is able to fit into, for example, a viral vector (e.g.adeno-associated virus). MCK gene regulatory regions (e.g. promoters andenhancers) display striated muscle-specific activity and have beencharacterized in vitro and in vivo. The major known regulatory regionsin the mouse MCK gene include a 206 base pair muscle-specific enhancerlocated approximately 1.1 kb 5′ of the transcription start site in mouse(i.e. SEQ ID NO:87) and a 358 base pair proximal promoter (i.e. SEQ IDNO:93) [Shield, et al. Mol. Cell. Biol., 16:5058 (1996)]. A larger MCKpromoter region may also be employed (e.g. SEQ ID NO:92), as well assmaller MCK promoter regions (e.g. SEQ ID NO:94).

[0154] The 206 base pair MCK enhancer (SEQ ID NO:87) contains a numberof sequence motifs, including two classes of E-boxes (MCK-L and MCK-R),CarG, and AT-rich sites. Similar E-box sequences are found in theenhancers of the human, rat, and rabbit MCK genes [See, Trask, et al.,Nucleic Acids Res., 20:2313 (1992)]. Mutation may be made to thissequence by, for example, inserting an additional MCK-R control elementinto a wild-type enhancer sequence naturally containing one MCK-Rcontrol element (such that the resulting sequence has at least two MCK-Rcontrol elements). For example, the inserted MCK-R control elementreplaces the endogenous MCK-L control element. The 206 base pair mouseenhancer (SEQ ID NO:2) may be modified by replacing the left E-box(MCK-L) with a right E-Box (MCK-R) to generate a mutant muscle-specificenhancer region (e.g. to generate SEQ ID NO:88). A similar approximately200 base pair wild type enhancer region in human may be modified byreplacing the left E-box with a MCK-R to generate a mutantmuscle-specific enhancer region (e.g. 2R human enhancer regions).

[0155] Another modification that may be made to generate mutantmuscle-specific enhancer regions by inserting the S5 sequence GAGCGGTTA(SEQ ID NO:95) into wild type mouse, human, and rat enhancer sequence.Making such a modification to the mouse enhancer SEQ ID NO:87, forexample, generates S5 mutant muscle-specific enhancer regions (e.g. SEQID NO:89). Another modification that may be made, for example, to thewild type mouse enhancer is replacing the left E-box (MCK-L) with aright E-Box (MCK-R), and also inserting the 5S sequence, to generate2R5S type sequences (e.g in mouse, SEQ ID NO:90). These mutantmuscle-specific enhancer regions may have additional sequences added tothem or sequences that are taken away. For example, the mutantmuscle-specific enhancer regions may have a portion of the sequenceremoved (e.g. the 3′ 41 base pairs). Examples of such mutant truncation2RS5 sequences in mouse is SEQ ID NO:91 with the 3′ 41 base pairsremoved, generating mutant truncated 2RS5 muscle-specific enhancerregions.

[0156] Any of these wild-type or mutant muscle-specific enhancer regionsdescribed above may be further modified to produce additional mutants.These additional mutants include, but are not limited to,muscle-specific enhancer regions having deletions, insertions orsubstitutions of different nucleotides or nucleotide analogs so long asthe transcriptional activity of the enhancer region is maintained.Guidance in determining which and how many nucleotide bases may besubstituted, inserted or deleted without abolishing the transcriptionalactivity may be found using computer programs well known in the art, forexample, DNAStar software or GCG (Univ. of Wisconsin) or may bedetermined empirically using assays provided by the present invention.

[0157] V. Expression Vectors

[0158] The present invention contemplates the use of expression vectorswith the compositions and methods of the present invention (e.g. withthe nucleic acid constructs encoding the mini-dystrophin peptides).Vectors suitable for use with the methods and compositions of thepresent invention, for example, should be able to adequately package andcarry the compositions and cassettes described herein. A number ofsuitable vectors are known in the art including, but are not limited to,the following: 1) Adenoviral Vectors; 2) Second Generation AdenoviralVectors; 3) Gutted Adenoviral Vectors; 4) Adeno-Associated VirusVectors; and 5) Lentiviral Vectors.

[0159] Those skilled in the art will recognize and appreciate that othervectors are suitable for use with methods and compositions of thepresent invention. Indeed, the present invention is not intended to belimited to the use of the recited vectors, as such, alternative meansfor delivering the compositions of the present invention arecontemplated. For example, in various embodiments, the compositions ofthe present invention are associated with retrovirus vectors and herpesvirus vectors, plasmids, cosmids, artificial yeast chromosomes,mechanical, electrical, and chemical transfection methods, and the like.Exemplary delivery approaches are discussed below.

[0160] 1. Adenoviral Vectors

[0161] Self-propagating adenovirus (Ad) vectors have been extensivelyutilized to deliver foreign genes to a great variety of cell types invitro and in vivo. “Self-propagating viruses” are those which can beproduced by transfection of a single piece of DNA (the recombinant viralgenome) into a single packaging cell line to produce infectious virus;self-propagating viruses do not require the use of helper virus forpropagation. As with many vectors, adenoviral vectors have limitationson the amount of heterologous nucleic acid they are capable ofdelivering to cells. For example, the capacity of adenovirus isapproximately 8-10 kb, the capacity of adeno-associated virus isapproximately 4.8 kb, and the capacity of lentivirus is approximately8.9 kb. Thus, the mutants of the present invention that provide shorternucleic acid sequences encoding the mini-dystrophin peptides (comparedto full length wild-type dystrophin (14kb)), improve the carryingcapacity of such vectors.

[0162] 2. Second Generation Adenoviral Vectors

[0163] In an effort to address the viral replication problems associatedwith first generation Ad vectors, so called “second generation” Advectors have been developed. Second generation Ad vectors delete theearly regions of the Ad genome (E2A, E2B, and E4). Highly modifiedsecond generation Ad vectors are less likely to generatereplication-competent virus during large-scale vector preparation, andcomplete inhabitation of Ad genome replication should abolish late genereplication. Host immune response against late viral proteins is thusreduced [See Amalfitano et al., “Production and Characterization ofImproved Adenovirus Vectors With the E1, E2b, and E3 Genes Deleted,” J.Virol. 72:926-933 (1998)]. The elimination of E2A, E2B, and E4 genesfrom the Ad genome also provide increased cloning capacity. The deletionof two or more of these genes from the Ad genome allows for example, thedelivery of full length or cDNA dystrophin genes via Ad vectors[Kumar-Singh et al, Hum. Mol. Genet., 5:913 (1996)].

[0164] 3. Gutted Adenoviral Vectors

[0165] “Gutted,” or helper dependent, Ad vectors contain cis-acting DNAsequences that direct adenoviral replication and packaging but do notcontain viral coding sequences [See Fisher et al. “RecombinantAdenovirus Deleted of All Viral Genes for Gene Therapy of CysticFibrosis,” Virology 217:11-22 (1996) and Kochanek et al. “A NewAdenoviral Vector: Replacement of All Viral Coding Sequences With 28 kbof DNA Independently Expressing Both Full-length Dystrophin andBeta-galactosidase’” Proc. Nat. Acad. Sci. USA 93:5731-5736 (1996)].Gutted vectors are defective viruses produced by replication in thepresence of a helper virus, which provides all of the necessary viralproteins in trans. Since gutted vectors do not contain any viral genes,expression of viral proteins is not possible.

[0166] Recent developments have advanced the field of gutted vectorproduction [See Hardy et al., “Construction of Adenovirus VectorsThrough Cre-lox Recombination,” J. Virol. 71:1842-1849 (1997) andHartigan-O'Conner et al., “Improved Production of Gutted Adenovirus inCells Expressing Adenovirus Preterminal Protein and DNA Polymerase,” J.Virol. 73:7835-7841 (1999)]. Gutted Ad vectors are able to maximallyaccommodate up to about 37 kb of exogenous DNA, however, 28-30 kb ismore typical. For example, a gutted Ad vector can accommodate the fulllength dystrophin or cDNA, but also expression cassettes or modulatorproteins.

[0167] 4. Adeno-Associated Virus Vectors

[0168] In preferred embodiments, the nucleic acid encoding themini-dystrophin peptides of the present invention are inserted inadeno-associated vectors (AAV vectors). AAV vectors evade a host'simmune response and achieve persistent gene expression through avoidanceof the antigenic presentation by the host's professional APCs such asdendritic cells. Most AAV genomes in muscle tissue are present in theform of large circular multimers. AAV's are only able to carry about 5kb of exogenous DNA. As such, the nucleic acid of the present inventionencoding the mini-dystrophin peptides is well suited, in someembodiments, for insertion into these vectors due the reduced size ofthe nucleic acid sequences.

[0169] The dystrophin expression cassettes of the present invention(containing nucleic acid encoding mini-dystrophin peptides) may becloned into any of a variety of cis-acting plasmid vectors that containthe adeno-associated virus inverted terminal repeats (ITRs) to allowproduction of infectious virus. For example, one such plasmid is thecis-acting plasmid (pCisAV) (Yan et al., PNAS, 97:6716-6721, 2000). Thisplasmid contains the AAV-ITRs separated by a NotI cloning site. The ITRelements were derived from pSub201, a recombinant plasmid from which aninfectious adeno-associated virus genome can be excised in vitro andused to study viral replication. After ligation of the dystrophinexpression cassette (isolated as a NotI fragment frompCK6DysR4-23-71-78An) into NotI-digested pCisAV, rAAV stocks aregenerated by cotransfection of pCisAV. CK6DysR4-23-71-78An and pRep/Cap(Fisher, et al. J. Virol. 70:520-532, 1996) together with coinfection ofthe recombinant adenovirus Ad.CMVlacZ into 293 cells. Recombinant AAVvector, for example, may then be purified on CsCl gradients as described(Duan, et al., Virus Res. 48:41-56, 1997).

[0170] 5. Lentiviral Vectors

[0171] Vectors based on human or feline lentiviruses have emerged asanother vector useful for gene therapy applications. Lentivirus-basedvectors infect nondividing cells as part of their normal life cycles,and are produced by expression of a package-able vector construct in acell line that expresses viral proteins. The small size of lentiviralparticles constrains the amount of exogenous DNA they are able to carryto about 10 kb. However, once again, the small size nucleic acidencoding the mini-dystrophin peptides of the present invention allowsuch vectors to be employed.

[0172] 6. Retroviruses

[0173] Vectors based on Moloney murine leukemia viruses (MMLV) and otherretroviruses have emerged as useful for gene therapy applications. Thesevectors stably transduce actively dividing cells as part of their normallife cycles, and integrate into host cell chromosomes. Retroviruses maybe employed with the compositions of the present invention (e.g. genetherapy), for example, in the context of infection and transduction ofmuscle precursor cells such as myoblasts, satellite cells, or othermuscle stem cells.

[0174] EXPERIMENTAL

[0175] The following examples are provided in order to demonstrate andfurther illustrate certain preferred embodiments and aspects of thepresent invention and are not to be construed as limiting the scopethereof.

[0176] In the experimental disclosure which follows, the followingabbreviations apply: N (normal); M (molar); mM (millimolar); μM(micromolar); mol (moles); mmol (millimoles); μmol (micromoles); nmol(nanomoles); pmol (picomoles); g (grams); mg (milligrams); μg(micrograms); ng (nanograms); l or L (liters); ml (milliliters); μl(microliters); cm (centimeters); mm (millimeters); μm (micrometers); nm(nanometers); ° C. (degrees Centigrade); and Sigma (Sigma Chemical Co.,St. Louis, Mo.).

EXAMPLE 1 Carboxy-Terminal Domain Truncated Dystrophin Genes

[0177] This example describes the generation of carboxy-terminaltruncated dystrophin nucleic acid sequences. In particular, thisexamples describes the construction of dystrophin nucleic acid sequencewith the entire carboxy-terminal domain deleted, and testing of thissequence in a mouse model for DMD.

[0178] A. Methods

[0179] The bases encoding amino acids 3402-3675 (corresponding to exons71-78) were deleted from the full length murine dystrophin cDNA (SEQ IDNO:2, accession No. M68859) by recombinant PCR, leaving the last threeamino acids (exon 79) of the dystrophin protein unaltered. Thisdystrophin Δ71-78 cDNA was cloned into an expression vector containingbases −2139 to +239 of the human-skeletal actin (HSA) promoter (Brennan,et al. J. Biol. Chem. 268:719, 1993). A splice acceptor from the SV40VP1 intron (isolated as a 400 bp HindIII/XbaI fragment from pSVL;Amersham Pharmacia Biotech) was inserted immediately 3′ of the HSAfragment, and the SV40 polyadenylation signal (isolated as a BamHIfragment from pCMVβ; MacGregor and Caskey, Nuc. Acid. Res., 17:2365,1989) was inserted 3′ of the dystrophin cDNA. The excised dystrophinΔ71-78 expression cassette was injected into wild-type C57B1/10×SJL/J F2hybrid embryos, and F_(o) mice were screened by PCR. Five positiveF_(o)'s were backcrossed onto the C57B1/10mdx background, and the linewith the most uniform expression levels was selected for analysis. Alsoemployed were previously described transgenic mdx mice that expressdystrophin constructs deleted approximately for exons 71-74 (Δ71-74) orexons 75-78 (Δ75-78), which remove amino acids 3402-3511 and 3528-3675,respectively, See Rafael et al., J. Cell Biol., 134:93-102, 1996).Transgenic mdx line Dp71 expresses the Dp71 isoform of dystrophin instriated muscle (Cox et al., Nat. Genet., 8:333-339, 1994).

[0180] i. Morphology Methods

[0181] Quadriceps, soleus, extensor digitorum longus (EDL), tibialisanterior, and diaphragm muscles were removed from the mice, frozen inliquid nitrogen cooled O.C.T. embedding medium (Tissue-Tek), and cutinto 7-μm sections. After fixing in 3.7% formaldehyde, sections werestained in hematoxylin and eosin-phloxine. Stained sections were imagedwith a Nikon E1000 microscope connected to a Spot-2 CCD camera. Todetermine the percentage of fibers containing central nuclei, the numberof muscle fibers with centrally-located nuclei was divided by the totalnumber of muscle fibers.

[0182] ii. Evans Blue Assays

[0183] 4 month old control mice and Δ71-78 mice were analyzed afterinjection with Evans blue, as described previously (Straub et al., J.Cell. Biol., 139:375-385, 1997). In brief, mice were tail vein-injectedwith 150 μl of a solution containing 10 mg/ml Evans blue dye in PBS (150mM NaCl, 50 mM Tris, pH 7.4). After 3 hours, the animals were euthanizedand mouse tissues were either fixed in 3.7% formaldehyde/0.5%glutaraldehyde to observe gross dye uptake, or frozen unfixed in O.C.T.embedding medium. To examine Evans blue uptake by individual fibers,7-μm-thick frozen sections were fixed in cold acetone and analyzed byfluorescence microscopy.

[0184] iii. Immunofluorescence Assays

[0185] Quadriceps and diaphragm muscles from C57B1/10, mdx, and Δ71-78mice were removed, frozen in O.C.T. embedding medium, and cut into 7-μmsections. Immunofluorescence was performed with previously describedantibodies against dystrophin (NH₂ terminus), α1-syntrophin (SYN17),β1-syntrophin, α-dystrobrevin-1 (DB670), α-dystrobrevin-2 (DB2), andutrophin. After incubation with primary antibodies, cryosections wereincubated with an FITC-conjugated goat anti-rabbit secondary antibodyand fluorescent images were viewed on a Nikon E1000 microscope.Antibodies to α-sarcoglycan (Rabbit 98), β-sarcoglycan (Goat 26),y-sarcoglycan (Rabbit 245), δ-sarcoglycan (Rabbit 215), sarcospan(Rabbit 235), α-dystroglycan (Goat 20), β-dystroglycan (AP 83), or nNOS(Rabbit 200) have been described previously (Duclos e al., J. Cell.Biol., 142:1461, 1998). Cy3-conjugated secondary antibodies were usedand images were viewed on a Bio-Rad MRC-600 laser scanning confocalmicroscope. All digitized images were captured under the sameconditions.

[0186] iv. Measurements of Contractile Properties Methods

[0187] Contractile properties of muscles from 6-month-old Δ71-78transgenic mice were compared with those of C57B1/10 wild-type and mdxmice using methods described previously (Lynch et al., Am. J. Physiol.,272:C2063, 1997). The samples included eight muscles each from the EDL,soleus, and diaphragm. Mice were deeply anesthetized with avertin andeach muscle was isolated and dissected free from the mouse. Afterremoval of the limb muscles, the mice were euthanized with the removalof the diaphragm muscle. The muscles were immersed in a bath filled withoxygenated buffered mammalian Ringer's solution (137 mM NaCl, 24 mMNaHCO₃, 11 mM glucose, 5 mM KCl, 2 mM CaCl₂, 1 mM MgSO₄, 1 mM NaH₂PO₄,and 0.025 mM tubocurarine chloride, pH 7.4). For each muscle, one tendonwas tied to a servomotor and the other tendon to a force transducer.Muscles were stretched from slack length to the optimal length for forcedevelopment and then stimulated at a frequency that produced absoluteisometric tetanic force (mN). After the measurements of the contractileproperties, the muscles were removed from the bath, blotted and weighedto determine muscle mass. Specific force (kN/m²) was calculated bydividing absolute force by total fiber cross sectional area.

[0188] v. Muscle Membrane Isolation Methods

[0189] Muscle microsomes from 12-14 month-old C57B1/10, mdx, Δ71-78,Δ71-74, Δ75-78, and Dp71 mice were prepared as described previously(Ohlendieck et al., J. Cell. Biol., 112:135, 1991). In brief, skeletalmuscle was homogenized in 7.5-vol homogenization buffer plus proteaseinhibitor Complete (Boehringer). The homogenate was centrifuged at14,000 g for 15 min to remove cellular debris. The supernatant wasfiltered through cheesecloth and spun at 142,000 g for 37 minutes tocollect microsomes. The microsome pellet was resuspended in KCl washbuffer (0.6 M KCl, 0.3 M sucrose, 50 mM Tris-HCl, pH 7.4) plus proteaseinhibitors and recentrifuged at 142,000 g for 37 minutes to obtainKCl-washed microsomes. The final pellet was resuspended in 0.3 M sucroseand 20 mM Tris-maleate, pH 7.0. Samples were quantified by the CoomassiePlus Protein Assay Reagent (Pierce Chemical Co.) and equivalent proteinloading was verified by SDS-PAGE. KCl-washed microsomes were analyzed byWestern blot using antibodies against β2-syntrophin, pan syntrophin,nNOS (Transduction Laboratories), β-dystroglycan, α-sarcoglycan(Novocastra Laboratories), and other proteins described above.

[0190] B. Results

[0191] i. Generation of Dystrophin Δ71-78 Transgenic Mice

[0192] To test the function of a dystrophin protein lacking both thesyntrophin and dystrobrevin binding sites, we prepared a cDNA expressionvector deleted for the COOH-terminal domain (corresponding to exons71-78; See FIG. 19) as described above. The structure of severaldystrophin transgenic constructs previously tested are also shown forcomparison. Mice expressing the dystrophin Δ71-78 transgene were crossedonto the mdx background and dystrophin levels were analyzed by Westernblotting. The expression of the dystrophin Δ71-78 transgene in skeletalmuscle was determined to be 10-fold higher than endogenous dystrophin.Immunofluorescent staining of quadriceps muscle using an antibodyagainst the NH₂-terminus of dystrophin revealed that the Δ71-78 proteinwas localized to the sarcolemma, similar to wild-type dystrophin.Dystrophin Δ71-78 expression was also found to be uniform in thediaphragm, EDL, and soleus muscles, but the tibialis anterior muscledisplayed a mosaic expression pattern. The human skeletal muscle-actinpromoter used in this study was not expressed in either smooth orcardiac muscle.

[0193] ii. Morphology of Dystrophin Δ71-78 Mice Appears Normal

[0194] We initially analyzed transgenic mdx mouse muscle tissues formorphological signs of dystrophy. Hematoxylin and eosin-stained limb anddiaphragm skeletal muscle sections of dystrophin Δ71-78 mice revealednone of the signs of fibrosis, necrotic fibers, or mononuclear cellinfiltration that were apparent in age-matched mdx controls. NMJs(neuromuscular junctions) of transgenic mice stained withrhodamine-labeled -bungarotoxin consistently appeared normal in contrastto the varying degrees of postsynaptic folding observed in mdx NMJs. Mdxmuscle fibers have previously been shown to be highly permeable to thevital dye Evans blue in vivo, reflecting damage to the dystrophic fibersarcolemma (Matsuda et al., J. Biochem. (Tokyo), 118:959, 1995).Skeletal muscle fibers from dystrophin Δ71-78 mice, like wild-typeanimals, were found not to be permeable to Evans blue dye.

[0195] iii. Analysis of Centrally Nucleated Muscle Fibers

[0196] Another hallmark of dystrophy in mdx mice is the presence oflarge numbers of centrally-nucleated muscle fibers, reflecting cycles offiber degeneration and regeneration (Torres and Duchen, Brain, 110:269,1987). To estimate the degree of myofiber regeneration occurring inΔ71-78 transgenic mice, centrally nucleated fibers were counted from avariety of muscle groups in age-matched wild-type, mdx, and Δ71-78 mice(See, Table 2). By 4 months of age, 71% of muscle fibers in mdxquadriceps muscles contained central nuclei, whereas wild-type muscleshad <1%. Interestingly, 4 month old dystrophin Δ71-78 quadriceps musclesdisplayed 1% central nuclei, indicating that very little, if any,regeneration was occurring. When 1-year-old mice were compared, a modestincrease in centrally nucleated fibers became apparent. Quadricepsmuscles from Δ71-78 mice contained 10% centrally nucleated fibers,although diaphragm muscles still displayed <1%. EDL and soleus musclesdisplayed 5 and 8% centrally nucleated fibers, respectively. Forcomparison, 1-year-old wild-type mice had <1% centrally nucleated fibersin both limb and diaphragm muscles. Furthermore, 1-year-old mdx limbmuscles had 60% centrally nucleated fibers, whereas the diaphragm had35%. TABLE 2 Percentage of Centrally Nucleated Fibers in Mouse SkeletalMuscles Line Age Quad Dia TA EDL Soleus C57/B110 4 <1 <1 ND ND ND mdx 471 58 ND ND ND Δ71-78 4 1 <1 ND ND ND C57/B110 12 <1 <1 <1 <1 <1 mdx 1265 35 58 50 61 Δ71-78 12 10 <1 ND 5 8 Δ71-74 15 5 <1 <1 <1 ND Δ75-78 158 <1 4 2 7

[0197] Previous studies of transgenic mice expressing dystrophinsdeleted for exons Δ71-74 (Δ71-74) or exons Δ75-78 (Δ75-78) revealed noincrease in the numbers of centrally nucleated fibers by 4 months of age(Rafael et al. 1996, see above). To contrast these mice with the 71-78transgenics, central nuclei counts were performed on 15-month-old Δ71-74and 75-78 mice. It was determined that these animals had central nucleicounts in between those of wild-type and Δ71-78 mice. The Δ71-74 andΔ75-78 mice had 5 and 8% centrally nucleated fibers in quadriceps,respectively (Table 2).

[0198] iv. Contractile Properties

[0199] Compared with muscles of wild-type mice, those from mdx micedisplayed a significant amount of necrosis, fibrosis, and infiltratingmononuclear cells. mdx skeletal muscles also displayed a loss ofspecific force-generating capacities when muscles were stimulated tocontract in vitro, providing an extremely sensitive and quantitativemeasurement of the dystrophic process (FIG. 20A). In contrast,dystrophin Δ71-78 mice had no major abnormalities when subjected to thesame analysis (FIG. 20B). Muscle mass for both EDL and diaphragm werenot significantly different between dystrophin Δ71-78 and wild-typemice, whereas dystrophin Δ71-78 soleus muscles were slightlyhypertrophied. When stimulated to contract, all three muscle groupsdisplayed specific forces not significantly different from wild-type(P<0.05). These results demonstrate that the dystrophin Δ71-78 proteinhas essentially the same functional capacity as the full-length protein.

[0200] V. Localization of the DAP Complex in Δ71-78 Mice

[0201] Immunofluorescent analysis of the peripheral DAP complex revealedα1-syntrophin, β1-syntrophin, α-dystrobrevin-1, and α-dystrobrevin-2 tobe localized at the sarcolemma with dystrophin, despite the lack ofsyntrophin and dystrobrevin binding sites in the transgene-encodeddystrophin. α1-syntrophin levels were similar between wild-type andΔ71-78 mice. However, the levels of β1-syntrophin were elevated at themembrane in Δ71-78 mice, particularly in those fibers that normallyexpress significant levels of this isoform. α-dystrobrevin-1 wasprimarily located at the NMJ in wild-type mice, and was exclusivelylocated at the NMJs in mdx mice. Surprisingly, in dystrophin Δ71-78mice, higher levels of α-dystrobrevin-1 were observed at the sarcolemmathan in wild-type mice. The Δ71-78 mice also displayed a slight increasein utrophin localization along the sarcolemma, but this increase wasless than the increase in mdx fibers. Immunofluorescent localization ofthe sarcoglycans, α- and β-dystroglycan, sarcospan, and nNOS in Δ71-78mice revealed no differences in the expression of these proteins whencompared with wild-type mice. The proper localization of these proteinsto the sarcolemma indicated that membrane targeting of the DAP complexcomponents can proceed in the absence of the COOH-terminal domain ofdystrophin.

[0202] vi. DAP Complex Protein Levels

[0203] To examine the levels of the DAP complex members that associatewith dystrophin, muscle microsomes were prepared from wild-type anddystrophin Δ71-78 mice and analyzed by Western blotting. This approachprovides information on the relative abundance of individual DAP complexmembers in muscles of separate lines of mice. Slightly elevated levelsof β-dystroglycan were detected in dystrophin Δ71-78 mice, which we havepreviously observed whenever dystrophin is overexpressed. Isoforms ofsyntrophin and dystrobrevin were present at slightly different levelswhen the dystrophin Δ71-78 membranes were compared with those fromwild-type mice. α1-syntrophin and β2-syntrophin levels were lower thanin wild-type mice, whereas the level of β1-syntrophin was elevated.Although there was approximately the same amount of α-dystrobrevin-2,there were elevated levels of α-dystrobrevin-1 in Δ71-78 microsomes. Areduction in nNOS was observed in dystrophin Δ71-78 muscle, indicatingthat nNOS binds weakly to the DAP complex in Δ71-78 mice. Levels ofα-sarcoglycan were similar in all lines tested, and provided an internalcontrol for protein loading.

[0204] Since some DAP complex members exhibited isoform changes inΔ71-78 mice, we examined purified microsomes from dystrophin Δ71-74 andΔ75-78 mice. Transgenic mdx mice that express the dystrophin isoformDp71 in muscle were also included in this study since these dystrophicmice have the DAP complex present at the sarcolemma. α1-syntrophinlevels were lower in all four transgenic lines compared with wild-typemice. Surprisingly, β1-syntrophin was absent in Δ71-74 microsomes butwas highly overexpressed in Δ75-78 and Dp71 microsomes. The Δ71-74microsomes had equivalent β2-syntrophin levels when compared withwild-type microsomes, but this isoform of syntrophin was reduced in bothΔ75-78 and Dp71 microsomes. A pan syntrophin antibody, which detects allthree isoforms of syntrophin, confirmed the upregulation of syntrophinin Δ75-78 and Dp71 microsomes. Similar to Δ71-78, α-dystrobrevin-1 waselevated in all dystrophin transgenic microsome preparations. However,in comparison with wild-type, α-dystrobrevin-2 was higher in Δ71-74 andΔ75-78, but equal in Dp71 microsomes. Contrary to the Δ71-78 mice,deleting either exons 71-74 or 75-78 restored nNOS to wild-type levels.However, Dp71 mice, which lack the NH₂-terminal and rod domains ofdystrophin, did not retain nNOS in the microsome fractions. Previousstudies have also shown that utrophin is upregulated in mdx and Dp71mice (Ohlendieck et al., Neuron, 7:499-508, 1991). Therefore, utrophinlevels were compared in all transgenic lines and we found that Δ71-78,Δ71-74, and Δ75-78 mice do not have the elevated levels seen in mdx andDp71 mice.

EXAMPLE 2 Construction of ΔR4-R23, ΔR2-R21+H3, and ΔR2-R1

[0205] This example describes the construction of R4-R23 (micro-dys1),ΔR2-R21+H3 (micro-dys3), and ΔR2-R1 (micro-dys2), three sequences with 4spectrin-like repeat encoding sequences. The ‘full-length’ humandystrophin cDNA that was started with was actually a sequence slightlysmaller than the true full-length human dystrophin cDNA. In particular,the starting sequence, called full-length HDMD (SEQ ID NO:47, see FIG.23) is the same as the wild-type human dystrophin in SEQ ID NO:1, exceptthe 3′ 1861 base pairs are deleted (at an XbaI site), and the 39 basepair alternatively spliced exon 71 (bases 10432-10470) are deleted. Thissequence (SEQ ID NO:47) is originally in pBSX (SEQ ID NO:46, See FIGS.21 and 22).

[0206] A. Cloning ΔR4-R23

[0207] The procedure used for cloning ΔR4-R23 is outlined in FIG. 24.Initially, three PCR reactions were performed (employing Pfu polymerase)to create the deletion shown in FIG. 24. The primers employed in thefirst reaction were 5′ GAA CAA GAT TCA CAC AAC TGG C 3′ (SEQ ID NO:48),which anneals to 1954-1975 of the HDMD clone, and 5′ GTT CCT GGA GTC TTTCAA GAT CCA CAG TAA TCT GCC TC 3′ (SEQ ID NO:49), which is a reversedtailed primer (the bold sequence anneals to 2359-2341 of the HDMD clone,and the underlined sequence anneals to 9023-9005 the HDMD clone. PCR wasconducted employing these primers, and a 425 bp PCR product wasproduced. The first primer employed in the second reaction was 5′ GAGGCA GAT TAC TGT GGA TCT TGA AAG ACT CCA GGA AC 3′ (SEQ ID NO:50), whichis the reverse complement primer of SEQ ID NO:49 (the bold-facedsequence of SEQ ID NO:50) anneals to 2341-2359 of the HDMD clone in theforward direction. The underlined sequence anneals to 9005-9023 of theHDMD clone in the forward direction. The other primer employed for thesecond reaction was 5′ TGT TTG GCG AGA TGG CTC 3′ (SEQ ID NO:51) whichanneals to 9413-9396 of HDMD in the reverse direction. PCR was conductedemploying these primers, and a 427 bp PCR product was produced. Thethird reaction employed the products from steps 1 and 2 and the outsideprimers SEQ ID NO:48 and SEQ ID NO:51, producing a 814 bp fragment byPCR. This fragment was then digested with NcoI and HindIII to produce a581 bp DNA fragment.

[0208] This 581 bp fragment was then cloned into a 5016 bp NcoI +HindIII fragment from the HDMD clone. The 581 bp fragment contained part ofrepeat 3, all of Hinge 2, and part of repeat 24. The NcoI site used inthe HDMD clone was located at 2055 bp. The Hind III site was located at9281 bp. The 5016 fragment contained the pBSX cloning vector sequence,and the entire 5′ UTR, the entire N terminus, Hinge 1, Repeats 1, 2, andpart of repeat 3 up to the NcoI site of human dystrophin. Ligation ofthe 5016 bp fragment and 581 bp fragment (step 2) was then performed tocreated a 5597 bp sequence.

[0209] Step 3 was then performed to clone a 2.9 kb HindIII fragmentcontaining part of repeat 24, the C terminus, and the 3′ UTR (See FIG.24). The 5′ HindIII site is located at 9281 bp of the HDMD clone. The 3′HindIII site of this fragment is derived from pBSX polylinker. This 2.9kb fragment was cloned into the HindIII site of the product of Step 2 toyield an 8.5 kb plasmid, composed of the ΔR4-R23 cDNA plus pBSX. Theentire ΔR4-R23 cDNA was excised from pBSX with NotI and cloned into theNotI site of the HSA expression vector (HSA promoter—VP1 intron—NotIsite—tandem SV40 poly adenylation site).

[0210] B. Cloning ΔR2-R21+H3

[0211] The procedure used for cloning ΔR2-R21+H3 is outlined in FIG. 25.Initially, four PCR reactions were performed (employing Pfu polymerase)to create the deletion shown in FIG. 25. The primers employed in thefirst reaction were 5′ GAT GTG GAA GTG GTG AAA GAC 3 (SEQ ID NO:52),which anneals to 1319-1330 of the HDMD clone, and 5′ CCA ATA GTO GTC AGTCCA GGA GCA TGT AAA TTG CTT TG 3′ (SEQ ID NO:53), which is a reverse,tailed primer (the bold-faced sequence anneals to 1546-1532 of the HDMDclone and the underlined sequence anneals to 7512-7490 of the HDMDclone. PCR was conducted with these primers and a 228 bp PCR product wasproduced. The first primer employed in the second reaction was 5′ CAAAGC AAT TTA CAT GCT CCT GGA CTG ACC ACT ATT GG 3′ (SEQ ID NO:54), whichis the reverse complement of SEQ ID NO:53 (the bold-faced sequence ofSEQ ID NO:54 anneals to 1532-1546 of the HDMD clone in the forwarddirection, and the underlined sequence anneals to 7512-7490 of the HDMDclone in the forward direction. The other primer employed in the secondreaction was 5′ CTG TTG CAG TAA TCT ATG CTC CAA CAT CAA GGA AGA TG 3′(SEQ ID NO:55), and the bold-faced sequence anneals to 8287-8270 of theHDMD clone, and the underlined sequence anneals to 7612-7593 of the HDMDclone as a reverse primer. PCR was performed with these primers, and a123 bp PCR product was produced. The first primer employed in the thirdreaction was 5′ CAT CTT CCT TGA TGT TGG AGC ATA GAT TAC TGC AAC AG 3′(SEQ ID NO:56), the underlined sequence anneals to 7593-7612 of the HDMDclone in the forward direction, and the bold-faced sequence anneals to8270-8287. The second primer employed in the third reaction was SEQ IDNO:51 (see above), which anneals to 9413-9396 in the reverse direction.PCR was performed with these primers, and a 1143 bp fragment wasproduced. The fourth reaction employed the products from reactions 1, 2,and 3 as template, and the outside primers (SEQ ID NO:52 and SEQ IDNO:51), and a 1494 bp fragment was produced using Pfu polymerase.

[0212] This 1494 bp fragment was then digested with MunI and HindIII toproduce a 1270 bp band and cloned into a 4320 bp MunI+HindIII fragmentfrom the HDMD clone. The 1270 bp fragment contained the part of repeat1, all of hinge 3, repeat 22, repeat 23, and part of repeat 24. The 4320bp fragment contained the 5′ UTR of HDMD, the N terminus, Hinge 1, andpart of repeat 1 and pBSX. The MunI site in HDMD is located at base1409. The HindIII site is at 9281 bp. Ligation of the 4320 bp fragmentand the 1270 bp fragment was then performed (See FIG. 25) and a 4490 hpfragment was produced. Step 3 was performed as describe above forΔR4-R23 to generate ΔR2-R21+H3.

[0213] C. Cloning ΔIR2-R21

[0214] The cloning of ΔR2-R21 was performed essentially the same way asfor ΔR2-R21+H3, with the exception of the recombinant PCR reaction toassemble the rod domain deletion (See, FIG. 26). All other steps are thesame. Three PCR reactions were performed (using Pfu polymerase) tocreate the deletion. The primers employed in the first reaction were SEQID NO:52 (see above), and 5° CTG TTG CAG TAA TCT ATG ATG TAA ATT GCT TTG3′ (SEQ ID NO:57), the underlined sequence anneals to 8287-8270 of theHDMD clone in the reverse direction, and the bold-faced sequence annealsto 1546-1532 of the HDMD clone in the reverse direction. PCR wasperformed with these primers, and a 250 bp product was obtained. Thefirst primer employed in the second reaction was 5′ CAA AGC AAT TTA CATCAT AGA TTA CTG CAA CAG 3′ (SEQ ID NO:58), which is is the reversecomplement of SEQ ID NO:57 (the bold-faced sequence of SEQ ID NO:58anneals to 1532-1546 of the HDMD clone in the forward direction, and theunderlined sequence anneals to 8270-8287 of the HDMD clone in theforward direction. The other primer employed in the second reaction wasSEQ ID NO:51, which anneals to 9413-9396 in the reverse direction. PCRwas performed with these primers and a 1143 bp product was obtained. Thethird reaction employed the products from reactions 1 and 2 (astemplate) and the outside primers (SEQ ID NO:52 and SEQ ID NO:51), and a1383 bp fragment was produced. This fragment was then digested with MunIand HindIII to produce an 1147 bp fragment containing part of repeat 1,repeat 22, repeat 23, and part of repeat 24. This was then cloned intothe same MunI+HindIII HDMD fragment described for the ΔR2-R21+H3 cloneand all other steps thereafter were the same.

EXAMPLE 3 ΔR4-R23 Deletions

[0215] This example describes the construction of 5′ UTR, 3′ UTR, andC-terminal deletions of ΔR4-R23 (making it even smaller), as well as theaddition of polyadenylation and promoter sequences. This example alsodescribes the alteration of the Kozak sequence (to become more like thatof consensus).

[0216] A. Deletion of the 3′ UTR

[0217] In order to delete the 3′ UTR, the following two primers wereemployed 5′ TCT CTC CAA GAT CAC CTC G 3′ (SEQ ID NO:64), which annealsto 9117-9134 of the HDMD full length clone, and 5′ ATG AAG CTT GCG GCCGCA TGC GGG AAT CAG GAG TTG 3′ (SEQ ID NO:65) (the underlined site is aHindIII site that was included in this primer and the bold-faced type isa NotI site). SEQ ID NO:65 is a reverse primer that anneals to11340-11322 of HDMD in the 3′ UTR. These primers cause the deletion of707 bp of the 3′ UTR from the XbaI cloning site located at 12057 to theend of this primer (SEQ ID NO:65), leaving 113 bp of native 3′ UTR, andintroducing NotI and HindIII cloning sites. The PCR product obtainedusing the primers corresponding to SEQ ID NOS:64 and 65 on the pΔR4-R23clone was named HdysΔ3′UTR and was saved for use as a template togenerate a further deletion of exons 71-78 (see part C below).

[0218] B. Deletion of 5′ UTR and Alteration of Kozak Sequence

[0219] A portion of the 5′ UTR was deleted (and the Kozak sequence wasaltered in the same step). The ‘step 2’ clone from cloning of ΔR4-R23was utilized (this was the the product of ligating the step 1 PCRproduct into the 5016 bp NcoI and HindIII fragment from the HDMDfull-length clone, and this clone contained pBSX backbone plus the 5′UTR, N terminus, Hinge 1, Repeats 1, 2, 3, Hinge 2, and part of repeat24. There is an MunI site located in the first repeat at nucleotide 1409of the HDMD cDNA. In addition, there is a NotI site that is polylinkerderived at the 5′ end of the clone. These two sites were employed,MunI+NotI, to clone a new fragment containing a truncated 5′ UTR and analtered Kozak sequence as follows. PCR was performed, using Pfupolymerase using the following primers. The first primer was 5′ TAG CGGCCG CGG TTT TTT TTA TCG CTG CCT TGA TAT ACA CTT TCC ACC ATG CTT TGG TGGGAA GAA GTA G 3′ (SEQ ID NO:59). We created a NotI site (underlined) inthis primer so the product could be cloned back into the NotI site fromthe polylinker. The sequence immediately 3′ to this NotI sitecorresponds to the dystrophin 5′ UTR sequence (the original Kozaksequence was changed with this primer, from TCAAAATGC, changed toCCACCATGC. The second primer was 5′ TTT TCC TGT TCC AAT CAG C 3′ (SEQ IDNO:60) which anneals to sequence 1441-1423 of HDMD full length clone.The final product of this reaction was 1270 bp and was digested withNotI+MunI to produce a 1233 bp fragment that was then cloned into theNotI (polylinker)+MunI site in Repeat 1 of the “Step 2” clones(described above for ΔR4-23). This new clone was named pHDMD5′ Kozak.

[0220] C. Deletion of Exons 71-78 (C-terminal)

[0221] Using three PCR reactions, a 71-78 deletion was created. We usedthe HindIII fragment containing the 3′UTR that was generated bydigestion of the HDMD full-length dystrophin cDNA with HindIII as thevector to clone the 71-78 fragment into the HindIII site. The primeremployed for the first reaction were 5′ GGC TTC CTA CAT TGT GTC AGT TTCCAT GTT GTC CCC 3′ (SEQ ID NO:66), and 5′ TCT CTC CAA GAT CAC CTC 3′(SEQ ID NO:67) anneals to 9117-9134 of HDMD. PCR was performed employingthese primers and a 1334 bp product was produced. The primers for thesecond reaction were SEQ ID NO:65, and 5′ GGG GAC AAC ATG GAA ACT GACACA ATG TAG GAA GCC 3′ (SEQ ID NO:68), where the bold-face sequenceanneals to exon 70 at 10415-10431 in the forward direction, and theunderlined sequence anneals to 11216-11233 in the forward direction. PCRwas performed and a 150 bp fragment was generated. The product ofreactions 1 and 2 were used as template and the outside primers SEQ IDNO:65 and SEQ ID NO:67 were used to prime the reaction which generatedthe complete 71-78 C terminus (1484 bp). This product was digested withHindIII to produce a 1319 bp fragment and was cloned into the HindIIIsite of pTZ19R (See FIG. 35). This new clone was namedpTZ-HDMD-H3Δ71-78Δ3.

[0222] D. Cloning of the SV40 pA Sequence into the Not I Site

[0223] The next step was the cloning of the SV 40 pA sequence: (SEQ IDNO:71) 5′GATCCAGACATGATAAGATACATTGATGAGTTTGGACAAACCACAACTAGAATGCAGTGAAAAAAATGCTTTATTTGTGAAATTTGTGATGCTATTGCTTTATTTGTAACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTTATGTTTCAGGTTCAGGGGGAGGTGTGGGAGGTTTTTTCG GATC3′

[0224] NO:71) into the NotI site of the 3′ HindIII fragment that nowcontains the 3′ UTR and 71-78. A PCR reaction was performed on thetemplate pHSA with a reverse primer 5′ AGC GGC CGC AAA AAA CCT CCC ACACCT CC 3′ (SEQ ID NO:69, containing a regenerating NotI site—underlined)and 5′ TAC GGC CGA TCC AGA CAT GAT AAG ATA C 3′ (SEQ ID NO:70,containing a destroying EagI site, in bold). All other sequence (besidesthe NotI and EagI sites) is SV40 pA. This PCR reaction generated a 195bp product+cloning sites=209 bp. We then cloned this fragment into theNotI site of pTZ-HDMD-H3Δ71-78Δ3 generated by PCR in the 3′ UTR clone.The upstream (5′-most) NotI site in this clone was destroyed by EagIligation. This new clone was named pTZ-HDMD-H33′A.

[0225] E. Cloning of CK6 Promoter into NotI Site

[0226] The CK6 promoter—5′ GGT- (SEQ ID NO:61)ACTACGGGTCTAGGCTGCCCATGTAAGGAGGCAAGGCCTGGGGACACCCGAGATGCCTGGTTATAATTAACCCCAACACCTGCTGCCCCCCCCCCCCCAACACCTGCTGCCTGAGCCTGAGCGGTTACCCCACCCCGGTGCCTGGGTCTTAGGCTCTGTACACCATGGAGGAGAAGCTCGCTCTAAAAATAACCCTGTCCCTGGTGGGCCCAATCAAGGCTGTGGGGGACTGAGGGCAGGCTGTAACAGGCTTGGGGGCCAGGGCTTATACGTGCCTGGGACTCCCAAAGTATTACTGTTCCATGTTCCCGGCGAAGGGCCAGCTGTCCCCCGCCAGCTAGACTCAGCACTTAGTTTAGGAACCAGTGAGCAAGTCAGCCCTTGGGGCAGCCCATACAAGGCCATGGGGCTGGGCAAGCTGCACGCCTGGGTCCGGGGTGGGCACGGTGCCCGGGCAACGAGCTGAAAGCTCATCTGCTCTCAGGGCCCCTCCCTGGGGACAGCCCCTCCTGGCTAGTCACACCCTGTAGGCTCCTCTATATAACCCAGGGGCACAGGGGCTGCCCCCGGGTCACGGGGATCCTCTAGACC-3′

[0227] was amplified using two tailed primers: 5′ AGC GGC CGC GGT ACTACG GGT CTA GG 3′ Forward (SEQ ID NO:62), and 5′ ATC GGC CGT CTA GAG GATCCC CGT GAC C 3′ Reverse (SEQ ID NO:63). The underlined sequence is aNotI site added to the end of this primer. The remaining sequence is CK6sequence. The bold-faced type is an EagI site added to the end of thisprimer. The remaining sequence is from CK6. The CK6 promoter wasamplified this way so we could add the NotI and EagI sites (so theentire cassette could be excised when put back together with NotI). ThisPCR product was therefore digested with NotI and EagI and ligated intothe NotI site of pHDMD5′Kozak. This new clone was named pCK6HDMD5′Kozak.NotI and EagI produce compatible cohesive sites, but when EagI ligatesto NotI, it destroys the site. So we placed the EagI site at the 3′ end,so that when the final construct was cut with NotI, the entireexpression cassette could be excised intact. The same strategy wasemployed at the 3′ end when placing the SV40 poly A sequence into the 3′Not I site.

[0228] F. Re-ligating the 5′ and 3′ Ends.

[0229] This step was performed as described above in themicro-dystrophin transgene constructs. We reconstituted the same cloningsites but with modifications in the fragments, so the modified 3′ end,isolated as a HindIII fragment from clone pTZ-HDMD-H33′A (example 3 partD), was able to be cloned into the HindIII site of pCK6HDMD5′Kozak(example 3, part E). This final clone, named pCK6R4-R23KozakΔ3′,contains a truncated dystrophin expression cassette that can be excisedin its entirety by digestion with NotI. This excised expression cassettecan then be used for a variety of purposes. One such purpose is to clonethe cassette into a plasmid containing the inverted terminal repeatsfrom adeno-associated virus. By cloning the dystrophin expressioncassette HDMD-H33′A into a cloning site between the two ITRs of AAV, arecombinant AAV vector could be produced.

EXAMPLE 4 Construction of Reduced Repeat Dystrophin Constructs

[0230] This example describes the construction of ΔH2-R19 (an 8spectrin-like-repeat sequence), pΔR9R16 (a 16 spectrin-like-repeatsequence), pΔR1R24 (a zero spectrin-like-repeat sequence), pΔH2-H3 (an 8spectrin-like repeat sequence), and ΔH2-R19,R20 (a 7 spectrin-likerepeat sequence). One starting plasmid was pHBMD, a human dystrophincDNA (full-length HDMD, SEQ ID NO:47) with a further deletion of thesequences encoded by exons 17-48. The cDNA was cloned into thecommercially available plasmid vector pTZ19r (MBI Fermentas; Genbankaccession number Y14835, See FIG. 35), into which an EcoRI-SalI adapter(prepared by self-annealing of the oligonucleotide 5′-AATTCGTCGACG-3′,SEQ ID NO:83) had been ligated into the the EcoRI site. Base number 1 ofthe EDNA is immediately 3′ of the adapter sequence, and the cDNA ends atthe XbaI site at base 12,100 of SEQ ID NO:1. This XbaI site had beenligated into the XbaI site of the plasmid ptZ19r. Another startingplasmid is pBSX (SEQ ID NO:46), a modified version of pBluescript KSII+(Stratagene) which is used to make pBSXA (pBSX into which the SV40polyadenylation signal (pA) was added). This pA sequence was excised asa 206 bp fragment from pCMVβ (Clonetech), blunt-ended with DNApolymerase I, and ligated into the blunt-ended KpnI site of pBSX.

[0231] Another starting plasmid is pCK3, which is pBSX with the 3.3 kbmouse muscle creatine kinase enhancer plus promoter attached to the minxintron (See, Hauser et al., Mol Ther., 2:16-25, 2000). Another staringplasmid is pHDSK, which is pHBMD digested with KpnI, to remove thedystrophin sequences 3′ of the internal KpnI site (base 7,616 of thehuman dystrophin cDNA sequence, SEQ ID NO:1). A further starting vectoris p44.1, which is pBluescript KS− (Stratagene) carrying a humandystrophin cDNA fragment spanning the EcoRI site at base 7,002 to theEcoRI site at base 7,875 of the full-length human dystrophin cDNAsequence, cloned into the EcoRI site of the vector. Another plasmidemployed was p30-2, pBluescribe (Stratagene) containing a fragment fromthe full-length human dystrophin cDNA spanning bases 1,455 to the EcoRIsite at base 2,647, cloned into the EcoRI site of the vector. Anadditional vector employed was p30-1, pBluescribe (Stratagene)containing an EcoRI fragment from the full-length human dystrophin cDNAspanning bases 2,647 to 4,558, cloned into the EcoRI site of the vector.An further plasmid employed is p47-4, pBluescript KS− (Stratagene)carrying the human dystrophin cDNA EcoRI fragment spanning bases 4,452to 7,002 of the full-length cDNA sequence, cloned into the EcoRI site ofthe vector. Another plasmid is p9-7, pBluescribe (Stratagene) containingbases 1-1,538 of the full-length human dystrophin cDNA. Base 1 isattached to a linker of the sequence 5′ GAATTC-3′ and cloned into theEcoRI site of the vector. Base 1,538 is blunt-end cloned into the PstIsite of the vector, which had been destroyed by fill-in with T4 DNApolymerase. Another vector employed is p63-1, pBluescript KS−(Stratagene) carrying the human dystrophin cDNA EcoRI fragment spanningbases 7,875 to the 3′ end of the full-length cDNA, cloned into the EcoRIsite of the vector (the 3′ end of the cDNA is ligated to a linker of thesequence 5′-GAATTC-3′).

[0232] Initially, the MCK promoter plus enhancer and the minx intronwere excised from pCK3 by digestion with EagI, yielding a 3.5 kbfragment that was ligated into EagI-digested pBSXA to make pBSXACK3.Truncated dystrophin cDNAs, derived from pHBMD, containing variousdeletions of dystrophin domains were prepared as described below. ThecDNA inserts were excised from the plasmid backbone with SalI, andligated into pBSXACK3 at the SalI site, which is located between theminx intron and the pA sequence, such that the 3′ end of the cDNA wasadjacent to the pA sequence. The isolation of the truncated cDNAs isdescribed below. pBSXACK3-truncated dystrophin plasmids were digestedwith BssHII to release the expression vectors, which were gel purifiedand used to generate transgenic mice.

[0233] Isolation of ΔH2R19

[0234] A PCR product was generated by amplification of plasmid p30-2with primers (SEQ ID NO:72) 5′-TGTGCTGCAAGGCGATTAAGTTGG-3′ and (SEQ IDNO:75) 5′-GAGCTAGGTCAGGCTGCTGTGAAATCTGTGC-3′.

[0235] SEQ ID NO:75 overlaps the end of repeat 3 and the beginning ofhinge 3. Primer SEQ ID NO:72 corresponds to a sequence in the plasmidvector adjacent to the cloning site. A second PCR product was generatedby amplification of plasmid p44-1 using primers5′-CCAGGCTTTACACTTTATGCTTCC-3′ (SEQ ID NO:73) and5′-GCACAGATTTCACAGCAGCCTGACCTAGCTC-3′ (SEQ ID NO:74). Primer SEQ IDNO:74 is the reverse complement of primer SEQ ID NO:75. Primer SEQ IDNO:73 corresponds to a sequence in the plasmid vector adjacent to thecloning site. The PCR products were then purified by agarose gelelectorphoreses, and quantified. A recombinant PCR product was thenprepared by mixing together 10 ng of each of the first two PCR products,then re-PCR amplifying using only primers SEQ ID NO:72 and SEQ ID NO:73.This recombinant PCR product was then digested with NheI and KpnI, andligated into NheI and KpnI digested pHΔSK to generate plasmid pHBMDΔH2(NheI cuts at cDNA base 1,519, and KpnI cuts at base 7,616 of thefull-length human dystrophin cDNA sequence). pHBMDΔH2 was then digestedwith KpnI and XbaI, and ligated to the KpnI-XbaI fragment from pHBMD(this latter fragment contains the full-length human dystrophin cDNAbases 7,616 to 12,100) to obtain plasmid pΔH2R19.

[0236] Isolation of pΔR9R16

[0237] Plasmid p44-1 was digested with EcoRI and Asp718 to excise a 610bp cDNA insert, that was ligated into pBSX digested with EcoRI andAsp718, yielding pBSX44AE. pBSX44AE was digested with EcoRI and XbaI,and ligated to the NheI-EcoRI cDNA-containing fragment from p30-2,yielding pBSX44AE/30-2NE. Plasmid pBSX44AE/30-2NE was linearized bydigestion with EcoRI, into which was ligated the EcoRI-digestedrecombinant PCR product ΔR9-R16. This latter recombinant PCR product wasgenerated as follows. Plasmid p30-1 was amplified with primers SEQ IDNO:72 and 5′-CCATTTCTCAACAGATCTTCCAAAGTCTTG-3′ (SEQ ID NO:77), andplasmid p47-4 was amplified by PCR with primers SEQ ID NO:73 and5′-CAAGACTTTGGAAGATCTGTTGAGAAATGG-3 (SEQ ID NO:76). A recombinant PCRproduct (ΔR9-R16) was then prepared by mixing together 10 ng of each ofthe first two PCR products, then re-PCR amplifying using only primersSEQ ID NO:72 and SEQ ID NO:73. This recombinant PCR product was thendigested with EcoRI, and ligated into EcoRI digested pBSX44AE/30-2NE togenerate plasmid pR9R16int. Plasmid pR9R16int was digested with NcoI andAsp718, and the 3 kb cDNA fragment was isolated and ligated into NcoIand Asp718 digested pHΔSK to generate pΔR9R16.

[0238] Isolation of pΔR1R24

[0239] Plasmid p9-7 was PCR amplified with PCR primers5′-AGTGTGGTTTGCCAGCAGTC (SEQ ID NO:80) and5′-CAAAGTCCCTGTGGGCGTCTTCAGGAGCTTCC-3′ (SEQ ID NO:79). Plasmid p63-1 wasPCR amplified with primers 5′ GGAAGCTCCTGAAGACGCCCACAGGGACTTTG-3′ (SEQID NO:78) and 5′-TGGTTGATATAGTAGGGCAC-3′ (SEQ ID NO:81). A recombinantPCR product (ΔR1-R24) was then prepared by mixing together 10 ng of eachof the first two PCR products, then re-PCR amplifying using only primersSEQ ID NO:80 and SEQ ID NO:81. This recombinant PCR product was thendigested with SexAI and PpuMI, and ligated into SexAI and PpuMI digestedpHBMD to generate plasmid pΔR1R24.

[0240] Isolation of pΔH2-H3

[0241] This clone was prepared exactly as pΔH2-R19, except that primer5′-CAGATTTCACAGGCTGCTCTGGCAGATTTC-3′ (SEQ ID NO:82) was used in place ofprimer SEQ ID NO:74, and primer 5′-GAAATCTGCCAGAGCAGCCTGTGAAATCTG-3′(SEQ ID NO:84) was used in place of primer SEQ ID NO:75.

[0242] Isolation of ΔH2-R19,R20

[0243] This clone was generated from clone pΔH2R19 as follows. Plasmidp44-1 was amplified with primers SEQ ID NO:72 and5′-TGAATCCTTTAACATAGGTACCTCCAACAT-3′ (SEQ ID NO:85). Plasmid 63-1 wasamplified with primers 5′-ATGTTGGAGGTACCTATGTTAAAGGATTCA-3′ (SEQ IDNO:86) and SEQ ID NO:81. The PCR products were then purified by agarosegel electorphoreses, and quantified. A recombinant PCR product was thenprepared by mixing together 10 ng of each of the first two PCR products,then re-PCR amplifying using only primers SEQ ID NO:72 and SEQ ID NO:81.This product was digested with Asp718 and BstXI, and ligated into Asp718and BstXI digested pHBMD generating clone pBMDΔR20. The Asp718-XbaIcDNA-containing fragment from pBMDΔR20 was isolated and ligated intoAsp718 and XbaI digested pΔH2R19 to generate pΔH2-R19,R20.

EXAMPLE 5 Testing Truncated Dystrophin in mdx Mice

[0244] This example describes the generation of transgenic mdx miceexpressing truncated dystrophin cDNA (see above), and testing these micein various ways to determine various measurable muscle values. A varietyof dystrophin expression cassettes (FIG. 27) were used to generatetransgenic mice to test their functional capacity in alleviatingmuscular dystrophy on the dystrophin null mdx background. FIG. 27depicts the truncated dystrophin cDNA sequences tested, all of whichwere linked to an regulatory regions, a minx intron, and the SV40polyadenylation sequence (the 4-repeat constructs employed the HSA actinpromoter, See Crawford et al., J. Cell. Biol., 150:1399, 2000; and theremaining sequences employed an MCK enhancer and promoter, see Niwa etal., Genes Dev. 4:1552, 1990). Each of these constructs was released bydigestion from plasmid hosts, were gel purified, and used to generatetransgenic mice.

[0245] Excised expression cassettes injected into wild typeC57B1/10×SJL/J F2 hybrid embryos, and F⁰ mice were screened by PCRanalysis of DNA isolated from tail snips. Positive F⁰ mice werebackcrossed onto the C57B1/10mdx background, and individual mouse lineswere tested for dystrophin expression by immunofluorescent analysis withdystrophin antibodies for of expression in skeletal muscle fibers. Linesthat displayed uniform expression of dystrophin in muscle fibers wereselected for further analysis. These lines were further backcrossed ontothe mdx mouse background before analysis of dystrophin expression,muscle function and morphology.

[0246] A. Truncated Dystrophin cDNAs are Expressed at Various Levels inMuscles of Transgenic mdx Mice.

[0247] Muscle extracts were analyzed by western (immuno) blot analysisto determine the amount of dystrophin made in different muscles of thetransgenic mdx mice. For these studies, total protein was extracted fromthe quadriceps and diaphragm muscles of control and transgenic mice, andprotein concentrations were determined using the Coomassie Plus ProteinAssay Reagent (Pierce). One hundred micrograms of each sample waselectrophoresed on a 6% polyacrylamidc/SDS gel (29.7:0.3/acryl:bis),transferred for 2 hours at 75 volts onto Biotrace Nitrocellulose (GelmanScience) in 1× Tris-Glycine, 20% methanol, 0.05% SDS, using awet-transfer apparatus (Hoefer). Membranes were blocked in 10% non-fatdry milk, 1% normal goat serum, and 0.1% Tween-20, and hybridized withDYS1 (Novacastra) at a 1/1000 dilution for 2 hours at room temperature,washed, and then probed with horse radish peroxidase conjugatedanti-mouse antibodies at a 1/2,000 dilution (Cappel). Blots weredeveloped using the ECL chemiluminescence system (Amersham). Allincubations contained 1% normal goat serum and 0.1% Tween-20. Theresults of the western blot indicated that R9-R16 was poorly expressedin this line of mice, especially in the diaphragm, and that H2-H3 wasvery poorly expressed in the diaphragm.

[0248] B. Truncated Dystrophin cDNAs Confer Various Degrees ofProtection on Muscles of Transgenic mdx Mice.

[0249] Various muscle groups from the different lines of transgenic miceexpressing truncated dystrophins were examined for morphologicalabnormalities, and for expression of dystrophin by indirectimmunofluorescence (IF) in individual fibers. IF analysis was performedas follows. Skeletal muscle was removed from control and transgenicanimals, cut into strips, embedded in Tissue-tek OCT mounting media(Miles, Inc.), and frozen quickly in liquid nitrogen-cooled isopentane.Seven micrometer sections were blocked with 1% gelatin in KPBS for 15minutes, washed in KPBS+0.2% gelatin (KPBSG), and incubated for 2 hoursin KPBSG+1% normal goat serum with affinity-purified dystrophin antibody18-4 (Cox et al., Nature, 364:725-729, 1993) at a dilution of 1/1000.After washing, the slides were incubated for 1 hour with eitherbiotin-labeled goat anti-rabbit polyclonal antibodies (Pierce), washedagain, and incubated with FITC (fluorescein isothiocynate)-conjugatedstreptavidin. After a final wash, Vectashield (Vector Laboratories,Inc.) with DAPI was applied and sections were photographed through adual bandpass filter under 40× magnification using a Nikon E1000microscope.

[0250] Morphological analysis of the muscles was performed as follows.Muscle groups from among the following types were chosen for analysis:Quadriceps (Quad), soleus, extensor digitorum longus (EDL), tibialisanterior (TA), and diaphragm muscles. Selected muscles were removed frommice, frozen in liquid nitrogen cooled O.C.T. embedding medium(Tissue-Tek), and cut into 7 μm sections. After fixing in 3.7%formaldehyde, sections were stained in hematoxylin and eosin-phloxine.Stained sections were imaged with a Nikon E1000 microscope andphotographed.

[0251] The results of this analysis show that micro-dystrophinexpression (ΔR4R23 transgene) in the diaphragm prevents the onset ofmuscular dystrophy in mdx mice. In particular, micro-dystrophintransgenic and wild-type C57B1/10 diaphragm sections stained withhematoxylin and eosin (H&E) show morphologically healthy muscle withoutareas of fibrosis, necrosis, mononuclear cell infiltration, or centrallylocated nuclei. Conversely, the mdx diaphragm displays a high level ofdystrophic morphology by H&E. Also, immuno-fluorescence, usinganti-dystrophin polyclonal primary antisera, demonstrates thatmicro-dystrophin transgenes are expressed at the sarcolemmal membrane ina similar fashion to that of wild-type dystrophin, while mdx mice do notexpress dystrophin.

[0252] H & E staining also shows that truncated dystrophins with 8 or 16spectrin-like repeats have varying abilities to prevent dystrophy in thediaphragm of transgenic mdx mice. The H2R19 maintains normal musclemorphology that is not different from wild-type C57B1/10 muscle. TheΔH2R19 muscle displays a very low percentage of centrally nucleatedfibers, while the ΔH2-R19,R20 and ΔR9-16 constructs display percentagesintermediate between ΔH2-R19 and mdx (see FIG. 28). The mdx diaphragmhad a large number of centrally nucleated fibers, many necrotic fibers,and large areas of mono-nuclear cell infiltration and fibrosis.

[0253] The results also show that quadriceps muscle fibers expressingmicro-dystrophin transgene (ΔR4R23 transgene) display normal morphologyand exclude Evans Blue Dye. Micro-dystrophin transgenic mdx or C57B1/10quadriceps sections stained with hematoxylin and eosin (H&E) displaymorphologically healthy muscle without areas of necrosis, fibrosis,mononuclear cell infiltration, or centrally-located nuclei, as opposedto sections of mdx muscle. The high abundance of central nuclei andmononuclear immune cell infiltration are evidence of muscle cellnecrosis. Imrnunofluorescence results indicate that micro-dystrophinsdisplay a subsarcolemmal expression pattern like that of wild-typedystrophin, while mdx mice do not express dystrophin. Evans Blue Dye(EBD) uptake is an indication of a damaged myofiber. For analysis of EBDuptake, mice were tail vein injected with 150 μl of a solutioncontaining 10 mg/ml Evans blue dye in PBS (150 mM NaCl, 50 mM Tris pH7.4). After three hours, the animals were euthanized and mouse tissueswere either fixed in 3.7% formaldehyde/0.5% glutaraldehyde to observegross dye uptake, or frozen unfixed in O.C.T. embedding medium. Toexamine Evans blue uptake by individual fibers, 7 μm thick frozensections were fixed in cold acetone and analyzed by fluorescencemicroscopy. The results of this testing indicate that fibers expressingmicro-dystrophin or wild-type dystrophin exclude EBD, and that damagedmdx muscle cell membranes are permeable to Evans Blue dye.

[0254] A hallmark of dystrophy in mdx mice is the presence of largenumbers of centrally-nucleated muscle fibers, reflecting cycles of fiberdegeneration and regeneration. To estimate the degree of myofiberregeneration occurring in the transgenic mice, centrally-nucleatedfibers were counted from quadriceps muscles in age-matched wild-type,mdx, and transgenic mdx mice (FIG. 28). To determine the percentage offibers containing central nuclei, the number of muscle fibers withcentrally-located nuclei was divided by the total number of musclefibers.

[0255] Expression of 8 or 4 repeat micro-dystrophin transgenes on themdx background significantly reduces the percentage of fibers withcentrally-located nuclei to wild-type or near wild-type levels (FIG.28). Dystrophin molecules with zero repeats are unable to correct themdx phenotype by this assay. The best constructs were observed to be the8 repeat H2-R19 and the 4 repeat R2-R23 constructs. Greater percentagesof centrally nucleated fibers were observed in mice expression the exon17-48 deletion, the 4 repeat R2R21 construct, the 7 repeat H2R19,R20construct, the 16 repeat R9R16 construct, and the zero repeat R1R24construct (FIG. 28). The results from the R9R16 construct likely do notreflect the full functional capacity of the 16 repeat dystrophin sincethis line of mice expressed very low levels of the truncated dystrophinprotein. All other muscles expressed levels of dystrophin that have beenshown to be capable of preventing dystrophy if the expressed protein isfunctional (Phelps et al, Hum Mol Genet; 4:1251-1258, 1995).

[0256] The functional capacity of the truncated dystrophins was alsoassessed by measuring muscle contractile properties in the transgenicmdx mice. Contractile properties of muscles from transgenic mice werecompared with those of C57B1/10 wild type and mdx mice. The samplesincluded 4-8 muscles each from the tibialis anterior (TA), extensordigitorum longus (EDL) or diaphragm. Mice were deeply anesthetized withavertin and each muscle was isolated and dissected free from the mouse.After removal of the limb muscles, the mice were euthanized with theremoval of the diaphragm muscle. The muscles were immersed in a bathfilled with oxygenated buffered mammalian Ringer's solution (137 mMNaCl, 24 mM NaHCO₃, 11 mM glucose, 5 mM KCl, 2 mM CaCl₂, 1 mM MgSO₄, 1mM NaH₂PO₄, and 0.025 mM tubocurarine chloride, pH 7.4). For eachmuscle, one tendon was tied to a servomotor and the other tendon to aforce transducer. Muscles were stretched from slack length to theoptimal length for force development and then stimulated at a frequencythat produced absolute isometric tetanic force (mN). Following themeasurements of the contractile properties, the muscles were removedfrom the bath, blotted and weighed to determine muscle mass. Specificforce (kN/m2) was calculated by dividing absolute force by total fibercross sectional area.

[0257]FIG. 29 shows that the 8 repeat dystrophin encoded by H2-R19supports normal force development in both the diaphragm (FIG. 29a) andEDL muscle (FIG. 29b). In contrast, previous studies showed that theexon 17-48 construct, which encodes a dystrophin with 8.25 spectrin-likerepeats, supports only 90-95% of normal force development in thediaphragm (Phelps et al., Hum Mol Genet, 4:1251-1258, 1995). The 8repeat dystrophin lacking a central hinge (H2-H3), and tile 7 repeatdystrophin (H2-R19,R20) both fail to support significant forcegeneration compared with dystrophic mdx muscles. The results from theR9-R16 construct likely do not reflect the full functional capacity ofthe 16 repeat dystrophin, since this line of mice expressed very lowlevels of the truncated dystrophin.

[0258]FIG. 30 shows that the micro-dystrophin transgenic mdx micedevelop less specific force than do C57B1/10 mice in the TA, but nearwild-type levels in the diaphragm. Micro-dys 1 and -2 refer totransgenes ΔR4-R23, and ΔR2-R21, respectively. FIG. 30A shows thatC57B1/10 mice display significantly higher specific force than bothtransgenic lines and mdx mice in the tibialis anterior (TA) muscle. Dataare presented as means ±standard error of the means (s.e.m.) with eachbar representing 6 to 8 TA muscles. ANOVA statistical testing wasperformed. (* indicates significance from C57B1/10, p<0.01; s indicatessignificance from C57B1/10, p<0.05). FIG. 30B shows that mice expressingMicro-dys 1 develop wild type levels of specific force in the diaphragm,while mice expressing Micro-dys 2 develop ˜22% less specific force bythe same assay when compared with C57B1/10. Both lines of mice developmore specific force than mdx mice in the diaphragm. Data are presentedas the percentage of wild type.

[0259] Dystrophic mice are susceptible to contraction-induced injury(Petrof, et al., Proc. Natl. Acad. Sci. USA. 90:3710-3714, 1993). Inthis part of the example tested whether the 4 repeat dystrophin cloneswould protect muscles of transgenic mdx mice from contraction inducedinjuries. To test contraction-induced injury, an experimental protocolconsisting of two muscle stretches was performed in live, anesthetizedanimals. The distal tendon of the TA was cut and secured to the leverarm of a servomotor that monitors position and force produced by themuscle. Stimulation voltage and optimal muscle length (L₀) for forceproduction were determined. The muscle was maximally stimulated and thenstretched 40% greater than L₀ (LC1) for 300 milliseconds. A secondlengthening contraction was performed 10 seconds later (LC2). Themaximum force that the muscle was able to produce after each stretch wasmeasured and expressed as a percentage of the force produced beforestretch. Mdx mice expressing micro-dystrophins were significantlyprotected from the dramatic force deficit produced after a lengtheningcontraction compared with mdx mice (FIG. 31). Micro-dys 1 and -2 referto transgenes ΔR4-R23, and ΔR2-R21, respectively. Furthermore, there wasno significant difference between either micro-dystrophin constructstudied in this assay and C57B1/10 mice following the second, mostdamaging lengthening contraction. Data are presented as means ±s.e.m.with each bar representing between 6 and 8 TA muscles from 9-11 week oldmice.

[0260] C. Truncated 4 Repeat Dystrophin cDNAs Restore the Ability to RunLong Distances to mdx Mice.

[0261] We have observed that mdx mice are not able to run for longdistances on a treadmill, as compared to wild-type mice (see below).Therefore, mice expressing four repeat dystrophins were compared withwild-type and mdx mice for ability to run for extended times on atreadmill. The exercising protocol utilized a six lane, enclosedtreadmill with a shock grid to allow forced running at a controlledrate. C57B1/10, C57B1/6×SJL F1, mdx or transgenic mdx mice were run at a15 degree downward angle to induce damaging eccentric musclecontractions. Mice were given a 15 minute acclimation period prior toexercise, and then ran at 10 meters/minute with a subsequent 5 m/minincrease in rate every 10 minutes until exhaustion. Exhaustion wasdetermined to be the time at which a mouse spent more than 5 secondssitting on the shock grid without attempting a re-entry to thetreadmill. As shown in FIG. 32, both lines of four repeat transgenicmice ran significantly farther than mdx mice. Micro-dys 1 and -2 referto transgenes ΔR4-R23, and ΔR2-R21, respectively. Micro-dystrophintransgenic mice are a genetic mixture of C57B1/6×SJL, and C57B1/10strains, and ran an intermediate distance between the two wild-typelines. Data are presented as means ±s.e.m. ANOVA statistical analyseswere performed. (* indicates values significantly different from mdxline, p<0.01; s indicates values significantly different from mdx line,p<0.05).

[0262] D. Micro-dystrophin Transgenic mdx Mice do not DisplayHypertrophy

[0263] As a way to measure the functional capacity of the four-repeatdystrophins, we weighed both whole mice and dissected tibialis anteriormuscles from age matched transgenic and control mice. The results shownin FIG. 33 show that the micro-dystrophin transgenic mdx mice do notdisplay the muscle hypertrophy normally observed in mdx mice. FIG. 33Ashows that three month old micro-dystrophin transgenic mdx mice weighedsignificantly less than age-matched mdx control mice. FIG. 11B showsthat tibialis anterior (TA) muscle masses in mdx mice were significantlyhigher than control muscle masses in C57B1/10 and in both lines of mdxmice expressing different micro-dystrophin transgenes. Data arepresented as means ±s.e.m. with each bar representing between 3 and 4mice. ANOVA statistical analyses were performed (* indicates differencefrom mdx line, p<0.01; Y indicates difference from C57B1/10 line,p<0.01; s indicates difference from C57B1/10 line, p<0.05). Micro-dys 1and -2 refer to transgenes ΔR4-R23, and ΔR2-R21, respectively.

EXAMPLE 6 Mini-dystrophin-containing Adeno-associated Viral Vectors

[0264] This example describes a construct that could be made in order toallow adeno-associated virus to express a mini-dystrophin peptide in atarget muscle cells. FIG. 34 shows a schematic illustration of a plasmidvector containing the adeno-associated virus inverted terminal repeats(AAV-ITRs), the muscle promoter plus enhancer fragment known as CK6 (SEQID NO:61, the ΔR2-R21 four repeat dystrophin cDNA (SEQ ID NO:40) with afurther deletion of sequences encoded on exons 71-78, plus a 195 basepair SV40 polyadenylation signal that would have a total insert size ofapproximately 4.7 kb. The cloning capacity of adeno-associated viralvectors is approximately 4.9 kb. As such, the construct could beefficiently packaged into AAV viral particles (e.g. this plasmidconstruct could be used to transfect cells such that AAV expressingmini-dystrophin peptide is expressed). These AAV then, for example, maybe administered to a subject with DMD or BMD (i.e. gene therapy tocorrect a muscle deficiency in a subject).

[0265] All publications and patents mentioned in the above specificationare herein incorporated by reference. Various modifications andvariations of the described method and system of the invention will beapparent to those skilled in the art without departing from the scopeand spirit of the invention. Although the invention has been describedin connection with specific preferred embodiments, it should beunderstood that the invention as claimed should not be unduly limited tosuch specific embodiments. Indeed, various modifications of thedescribed modes for carrying out the invention which are obvious tothose skilled in material science, chemistry, and molecular biology orrelated fields are intended to be within the scope of the followingclaims.

1 96 1 13957 DNA Homo sapiens 1 gggattccct cactttcccc ctacaggactcagatctggg aggcaattac cttcggagaa 60 aaacgaatag gaaaaactga agtgttactttttttaaagc tgctgaagtt tgttggtttc 120 tcattgtttt taagcctact ggagcaataaagtttgaaga acttttacca ggtttttttt 180 atcgctgcct tgatatacac ttttcaaaatgctttggtgg gaagaagtag aggactgtta 240 tgaaagagaa gatgttcaaa agaaaacattcacaaaatgg gtaaatgcac aattttctaa 300 gtttgggaag cagcatattg agaacctcttcagtgaccta caggatggga ggcgcctcct 360 agacctcctc gaaggcctga cagggcaaaaactgccaaaa gaaaaaggat ccacaagagt 420 tcatgccctg aacaatgtca acaaggcactgcgggttttg cagaacaata atgttgattt 480 agtgaatatt ggaagtactg acatcgtagatggaaatcat aaactgactc ttggtttgat 540 ttggaatata atcctccact ggcaggtcaaaaatgtaatg aaaaatatca tggctggatt 600 gcaacaaacc aacagtgaaa agattctcctgagctgggtc cgacaatcaa ctcgtaatta 660 tccacaggtt aatgtaatca acttcaccaccagctggtct gatggcctgg ctttgaatgc 720 tctcatccat agtcataggc cagacctatttgactggaat agtgtggttt gccagcagtc 780 agccacacaa cgactggaac atgcattcaacatcgccaga tatcaattag gcatagagaa 840 actactcgat cctgaagatg ttgataccacctatccagat aagaagtcca tcttaatgta 900 catcacatca ctcttccaag ttttgcctcaacaagtgagc attgaagcca tccaggaagt 960 ggaaatgttg ccaaggccac ctaaagtgactaaagaagaa cattttcagt tacatcatca 1020 aatgcactat tctcaacaga tcacggtcagtctagcacag ggatatgaga gaacttcttc 1080 ccctaagcct cgattcaaga gctatgcctacacacaggct gcttatgtca ccacctctga 1140 ccctacacgg agcccatttc cttcacagcatttggaagct cctgaagaca agtcatttgg 1200 cagttcattg atggagagtg aagtaaacctggaccgttat caaacagctt tagaagaagt 1260 attatcgtgg cttctttctg ctgaggacacattgcaagca caaggagaga tttctaatga 1320 tgtggaagtg gtgaaagacc agtttcatactcatgagggg tacatgatgg atttgacagc 1380 ccatcagggc cgggttggta atattctacaattgggaagt aagctgattg gaacaggaaa 1440 attatcagaa gatgaagaaa ctgaagtacaagagcagatg aatctcctaa attcaagatg 1500 ggaatgcctc agggtagcta gcatggaaaaacaaagcaat ttacatagag ttttaatgga 1560 tctccagaat cagaaactga aagagttgaatgactggcta acaaaaacag aagaaagaac 1620 aaggaaaatg gaggaagagc ctcttggacctgatcttgaa gacctaaaac gccaagtaca 1680 acaacataag gtgcttcaag aagatctagaacaagaacaa gtcagggtca attctctcac 1740 tcacatggtg gtggtagttg atgaatctagtggagatcac gcaactgctg ctttggaaga 1800 acaacttaag gtattgggag atcgatgggcaaacatctgt agatggacag aagaccgctg 1860 ggttctttta caagacatcc ttctcaaatggcaacgtctt actgaagaac agtgcctttt 1920 tagtgcatgg ctttcagaaa aagaagatgcagtgaacaag attcacacaa ctggctttaa 1980 agatcaaaat gaaatgttat caagtcttcaaaaactggcc gttttaaaag cggatctaga 2040 aaagaaaaag caatccatgg gcaaactgtattcactcaaa caagatcttc tttcaacact 2100 gaagaataag tcagtgaccc agaagacggaagcatggctg gataactttg cccggtgttg 2160 ggataattta gtccaaaaac ttgaaaagagtacagcacag atttcacagg ctgtcaccac 2220 cactcagcca tcactaacac agacaactgtaatggaaaca gtaactacgg tgaccacaag 2280 ggaacagatc ctggtaaagc atgctcaagaggaacttcca ccaccacctc cccaaaagaa 2340 gaggcagatt actgtggatt ctgaaattaggaaaaggttg gatgttgata taactgaact 2400 tcacagctgg attactcgct cagaagctgtgttgcagagt cctgaatttg caatctttcg 2460 gaaggaaggc aacttctcag acttaaaagaaaaagtcaat gccatagagc gagaaaaagc 2520 tgagaagttc agaaaactgc aagatgccagcagatcagct caggccctgg tggaacagat 2580 ggtgaatgag ggtgttaatg cagatagcatcaaacaagcc tcagaacaac tgaacagccg 2640 gtggatcgaa ttctgccagt tgctaagtgagagacttaac tggctggagt atcagaacaa 2700 catcatcgct ttctataatc agctacaacaattggagcag atgacaacta ctgctgaaaa 2760 ctggttgaaa atccaaccca ccaccccatcagagccaaca gcaattaaaa gtcagttaaa 2820 aatttgtaag gatgaagtca accggctatcaggtcttcaa cctcaaattg aacgattaaa 2880 aattcaaagc atagccctga aagagaaaggacaaggaccc atgttcctgg atgcagactt 2940 tgtggccttt acaaatcatt ttaagcaagtcttttctgat gtgcaggcca gagagaaaga 3000 gctacagaca atttttgaca ctttgccaccaatgcgctat caggagacca tgagtgccat 3060 caggacatgg gtccagcagt cagaaaccaaactctccata cctcaactta gtgtcaccga 3120 ctatgaaatc atggagcaga gactcggggaattgcaggct ttacaaagtt ctctgcaaga 3180 gcaacaaagt ggcctatact atctcagcaccactgtgaaa gagatgtcga agaaagcgcc 3240 ctctgaaatt agccggaaat atcaatcagaatttgaagaa attgagggac gctggaagaa 3300 gctctcctcc cagctggttg agcattgtcaaaagctagag gagcaaatga ataaactccg 3360 aaaaattcag aatcacatac aaaccctgaagaaatggatg gctgaagttg atgtttttct 3420 gaaggaggaa tggcctgccc ttggggattcagaaattcta aaaaagcagc tgaaacagtg 3480 cagactttta gtcagtgata ttcagacaattcagcccagt ctaaacagtg tcaatgaagg 3540 tgggcagaag ataaagaatg aagcagagccagagtttgct tcgagacttg agacagaact 3600 caaagaactt aacactcagt gggatcacatgtgccaacag gtctatgcca gaaaggaggc 3660 cttgaaggga ggtttggaga aaactgtaagcctccagaaa gatctatcag agatgcacga 3720 atggatgaca caagctgaag aagagtatcttgagagagat tttgaatata aaactccaga 3780 tgaattacag aaagcagttg aagagatgaagagagctaaa gaagaggccc aacaaaaaga 3840 agcgaaagtg aaactcctta ctgagtctgtaaatagtgtc atagctcaag ctccacctgt 3900 agcacaagag gccttaaaaa aggaacttgaaactctaacc accaactacc agtggctctg 3960 cactaggctg aatgggaaat gcaagactttggaagaagtt tgggcatgtt ggcatgagtt 4020 attgtcatac ttggagaaag caaacaagtggctaaatgaa gtagaattta aacttaaaac 4080 cactgaaaac attcctggcg gagctgaggaaatctctgag gtgctagatt cacttgaaaa 4140 tttgatgcga cattcagagg ataacccaaatcagattcgc atattggcac agaccctaac 4200 agatggcgga gtcatggatg agctaatcaatgaggaactt gagacattta attctcgttg 4260 gagggaacta catgaagagg ctgtaaggaggcaaaagttg cttgaacaga gcatccagtc 4320 tgcccaggag actgaaaaat ccttacacttaatccaggag tccctcacat tcattgacaa 4380 gcagttggca gcttatattg cagacaaggtggacgcagct caaatgcctc aggaagccca 4440 gaaaatccaa tctgatttga caagtcatgagatcagttta gaagaaatga agaaacataa 4500 tcaggggaag gaggctgccc aaagagtcctgtctcagatt gatgttgcac agaaaaaatt 4560 acaagatgtc tccatgaagt ttcgattattccagaaacca gccaattttg agctgcgtct 4620 acaagaaagt aagatgattt tagatgaagtgaagatgcac ttgcctgcat tggaaacaaa 4680 gagtgtggaa caggaagtag tacagtcacagctaaatcat tgtgtgaact tgtataaaag 4740 tctgagtgaa gtgaagtctg aagtggaaatggtgataaag actggacgtc agattgtaca 4800 gaaaaagcag acggaaaatc ccaaagaacttgatgaaaga gtaacagctt tgaaattgca 4860 ttataatgag ctgggagcaa aggtaacagaaagaaagcaa cagttggaga aatgcttgaa 4920 attgtcccgt aagatgcgaa aggaaatgaatgtcttgaca gaatggctgg cagctacaga 4980 tatggaattg acaaagagat cagcagttgaaggaatgcct agtaatttgg attctgaagt 5040 tgcctgggga aaggctactc aaaaagagattgagaaacag aaggtgcacc tgaagagtat 5100 cacagaggta ggagaggcct tgaaaacagttttgggcaag aaggagacgt tggtggaaga 5160 taaactcagt cttctgaata gtaactggatagctgtcacc tcccgagcag aagagtggtt 5220 aaatcttttg ttggaatacc agaaacacatggaaactttt gaccagaatg tggaccacat 5280 cacaaagtgg atcattcagg ctgacacacttttggatgaa tcagagaaaa agaaacccca 5340 gcaaaaagaa gacgtgctta agcgtttaaaggcagaactg aatgacatac gcccaaaggt 5400 ggactctaca cgtgaccaag cagcaaacttgatggcaaac cgcggtgacc actgcaggaa 5460 attagtagag ccccaaatct cagagctcaaccatcgattt gcagccattt cacacagaat 5520 taagactgga aaggcctcca ttcctttgaaggaattggag cagtttaact cagatataca 5580 aaaattgctt gaaccactgg aggctgaaattcagcagggg gtgaatctga aagaggaaga 5640 cttcaataaa gatatgaatg aagacaatgagggtactgta aaagaattgt tgcaaagagg 5700 agacaactta caacaaagaa tcacagatgagagaaagaga gaggaaataa agataaaaca 5760 gcagctgtta cagacaaaac ataatgctctcaaggatttg aggtctcaaa gaagaaaaaa 5820 ggctctagaa atttctcatc agtggtatcagtacaagagg caggctgatg atctcctgaa 5880 atgcttggat gacattgaaa aaaaattagccagcctacct gagcccagag atgaaaggaa 5940 aataaaggaa attgatcggg aattgcagaagaagaaagag gagctgaatg cagtgcgtag 6000 gcaagctgag ggcttgtctg aggatggggccgcaatggca gtggagccaa ctcagatcca 6060 gctcagcaag cgctggcggg aaattgagagcaaatttgct cagtttcgaa gactcaactt 6120 tgcacaaatt cacactgtcc gtgaagaaacgatgatggtg atgactgaag acatgccttt 6180 ggaaatttct tatgtgcctt ctacttatttgactgaaatc actcatgtct cacaagccct 6240 attagaagtg gaacaacttc tcaatgctcctgacctctgt gctaaggact ttgaagatct 6300 ctttaagcaa gaggagtctc tgaagaatataaaagatagt ctacaacaaa gctcaggtcg 6360 gattgacatt attcatagca agaagacagcagcattgcaa agtgcaacgc ctgtggaaag 6420 ggtgaagcta caggaagctc tctcccagcttgatttccaa tgggaaaaag ttaacaaaat 6480 gtacaaggac cgacaagggc gatttgacagatctgttgag aaatggcggc gttttcatta 6540 tgatataaag atatttaatc agtggctaacagaagctgaa cagtttctca gaaagacaca 6600 aattcctgag aattgggaac atgctaaatacaaatggtat cttaaggaac tccaggatgg 6660 cattgggcag cggcaaactg ttgtcagaacattgaatgca actggggaag aaataattca 6720 gcaatcctca aaaacagatg ccagtattctacaggaaaaa ttgggaagcc tgaatctgcg 6780 gtggcaggag gtctgcaaac agctgtcagacagaaaaaag aggctagaag aacaaaagaa 6840 tatcttgtca gaatttcaaa gagatttaaatgaatttgtt ttatggttgg aggaagcaga 6900 taacattgct agtatcccac ttgaacctggaaaagagcag caactaaaag aaaagcttga 6960 gcaagtcaag ttactggtgg aagagttgcccctgcgccag ggaattctca aacaattaaa 7020 tgaaactgga ggacccgtgc ttgtaagtgctcccataagc ccagaagagc aagataaact 7080 tgaaaataag ctcaagcaga caaatctccagtggataaag gtttccagag ctttacctga 7140 gaaacaagga gaaattgaag ctcaaataaaagaccttggg cagcttgaaa aaaagcttga 7200 agaccttgaa gagcagttaa atcatctgctgctgtggtta tctcctatta ggaatcagtt 7260 ggaaatttat aaccaaccaa accaagaaggaccatttgac gttcaggaaa ctgaaatagc 7320 agttcaagct aaacaaccgg atgtggaagagattttgtct aaagggcagc atttgtacaa 7380 ggaaaaacca gccactcagc cagtgaagaggaagttagaa gatctgagct ctgagtggaa 7440 ggcggtaaac cgtttacttc aagagctgagggcaaagcag cctgacctag ctcctggact 7500 gaccactatt ggagcctctc ctactcagactgttactctg gtgacacaac ctgtggttac 7560 taaggaaact gccatctcca aactagaaatgccatcttcc ttgatgttgg aggtacctgc 7620 tctggcagat ttcaaccggg cttggacagaacttaccgac tggctttctc tgcttgatca 7680 agttataaaa tcacagaggg tgatggtgggtgaccttgag gatatcaacg agatgatcat 7740 caagcagaag gcaacaatgc aggatttggaacagaggcgt ccccagttgg aagaactcat 7800 taccgctgcc caaaatttga aaaacaagaccagcaatcaa gaggctagaa caatcattac 7860 ggatcgaatt gaaagaattc agaatcagtgggatgaagta caagaacacc ttcagaaccg 7920 gaggcaacag ttgaatgaaa tgttaaaggattcaacacaa tggctggaag ctaaggaaga 7980 agctgagcag gtcttaggac aggccagagccaagcttgag tcatggaagg agggtcccta 8040 tacagtagat gcaatccaaa agaaaatcacagaaaccaag cagttggcca aagacctccg 8100 ccagtggcag acaaatgtag atgtggcaaatgacttggcc ctgaaacttc tccgggatta 8160 ttctgcagat gataccagaa aagtccacatgataacagag aatatcaatg cctcttggag 8220 aagcattcat aaaagggtga gtgagcgagaggctgctttg gaagaaactc atagattact 8280 gcaacagttc cccctggacc tggaaaagtttcttgcctgg cttacagaag ctgaaacaac 8340 tgccaatgtc ctacaggatg ctacccgtaaggaaaggctc ctagaagact ccaagggagt 8400 aaaagagctg atgaaacaat ggcaagacctccaaggtgaa attgaagctc acacagatgt 8460 ttatcacaac ctggatgaaa acagccaaaaaatcctgaga tccctggaag gttccgatga 8520 tgcagtcctg ttacaaagac gtttggataacatgaacttc aagtggagtg aacttcggaa 8580 aaagtctctc aacattaggt cccatttggaagccagttct gaccagtgga agcgtctgca 8640 cctttctctg caggaacttc tggtgtggctacagctgaaa gatgatgaat taagccggca 8700 ggcacctatt ggaggcgact ttccagcagttcagaagcag aacgatgtac atagggcctt 8760 caagagggaa ttgaaaacta aagaacctgtaatcatgagt actcttgaga ctgtacgaat 8820 atttctgaca gagcagcctt tggaaggactagagaaactc taccaggagc ccagagagct 8880 gcctcctgag gagagagccc agaatgtcactcggcttcta cgaaagcagg ctgaggaggt 8940 caatactgag tgggaaaaat tgaacctgcactccgctgac tggcagagaa aaatagatga 9000 gacccttgaa agactccagg aacttcaagaggccacggat gagctggacc tcaagctgcg 9060 ccaagctgag gtgatcaagg gatcctggcagcccgtgggc gatctcctca ttgactctct 9120 ccaagatcac ctcgagaaag tcaaggcacttcgaggagaa attgcgcctc tgaaagagaa 9180 cgtgagccac gtcaatgacc ttgctcgccagcttaccact ttgggcattc agctctcacc 9240 gtataacctc agcactctgg aagacctgaacaccagatgg aagcttctgc aggtggccgt 9300 cgaggaccga gtcaggcagc tgcatgaagcccacagggac tttggtccag catctcagca 9360 ctttctttcc acgtctgtcc agggtccctgggagagagcc atctcgccaa acaaagtgcc 9420 ctactatatc aaccacgaga ctcaaacaacttgctgggac catcccaaaa tgacagagct 9480 ctaccagtct ttagctgacc tgaataatgtcagattctca gcttatagga ctgccatgaa 9540 actccgaaga ctgcagaagg ccctttgcttggatctcttg agcctgtcag ctgcatgtga 9600 tgccttggac cagcacaacc tcaagcaaaatgaccagccc atggatatcc tgcagattat 9660 taattgtttg accactattt atgaccgcctggagcaagag cacaacaatt tggtcaacgt 9720 ccctctctgc gtggatatgt gtctgaactggctgctgaat gtttatgata cgggacgaac 9780 agggaggatc cgtgtcctgt cttttaaaactggcatcatt tccctgtgta aagcacattt 9840 ggaagacaag tacagatacc ttttcaagcaagtggcaagt tcaacaggat tttgtgacca 9900 gcgcaggctg ggcctccttc tgcatgattctatccaaatt ccaagacagt tgggtgaagt 9960 tgcatccttt gggggcagta acattgagccaagtgtccgg agctgcttcc aatttgctaa 10020 taataagcca gagatcgaag cggccctcttcctagactgg atgagactgg aaccccagtc 10080 catggtgtgg ctgcccgtcc tgcacagagtggctgctgca gaaactgcca agcatcaggc 10140 caaatgtaac atctgcaaag agtgtccaatcattggattc aggtacagga gtctaaagca 10200 ctttaattat gacatctgcc aaagctgctttttttctggt cgagttgcaa aaggccataa 10260 aatgcactat cccatggtgg aatattgcactccgactaca tcaggagaag atgttcgaga 10320 ctttgccaag gtactaaaaa acaaatttcgaaccaaaagg tattttgcga agcatccccg 10380 aatgggctac ctgccagtgc agactgtcttagagggggac aacatggaaa ctcccgttac 10440 tctgatcaac ttctggccag tagattctgcgcctgcctcg tcccctcagc tttcacacga 10500 tgatactcat tcacgcattg aacattatgctagcaggcta gcagaaatgg aaaacagcaa 10560 tggatcttat ctaaatgata gcatctctcctaatgagagc atagatgatg aacatttgtt 10620 aatccagcat tactgccaaa gtttgaaccaggactccccc ctgagccagc ctcgtagtcc 10680 tgcccagatc ttgatttcct tagagagtgaggaaagaggg gagctagaga gaatcctagc 10740 agatcttgag gaagaaaaca ggaatctgcaagcagaatat gaccgtctaa agcagcagca 10800 cgaacataaa ggcctgtccc cactgccgtcccctcctgaa atgatgccca cctctcccca 10860 gagtccccgg gatgctgagc tcattgctgaggccaagcta ctgcgtcaac acaaaggccg 10920 cctggaagcc aggatgcaaa tcctggaagaccacaataaa cagctggagt cacagttaca 10980 caggctaagg cagctgctgg agcaaccccaggcagaggcc aaagtgaatg gcacaacggt 11040 gtcctctcct tctacctctc tacagaggtccgacagcagt cagcctatgc tgctccgagt 11100 ggttggcagt caaacttcgg actccatgggtgaggaagat cttctcagtc ctccccagga 11160 cacaagcaca gggttagagg aggtgatggagcaactcaac aactccttcc ctagttcaag 11220 aggaagaaat acccctggaa agccaatgagagaggacaca atgtaggaag tcttttccac 11280 atggcagatg atttgggcag agcgatggagtccttagtat cagtcatgac agatgaagaa 11340 ggagcagaat aaatgtttta caactcctgattcccgcatg gtttttataa tattcataca 11400 acaaagagga ttagacagta agagtttacaagaaataaat ctatattttt gtgaagggta 11460 gtggtattat actgtagatt tcagtagtttctaagtctgt tattgttttg ttaacaatgg 11520 caggttttac acgtctatgc aattgtacaaaaaagttata agaaaactac atgtaaaatc 11580 ttgatagcta aataacttgc catttctttatatggaacgc attttgggtt gtttaaaaat 11640 ttataacagt tataaagaaa gattgtaaactaaagtgtgc tttataaaaa aaagttgttt 11700 ataaaaaccc ctaaaaacaa aacaaacacacacacacaca catacacaca cacacacaaa 11760 actttgaggc agcgcattgt tttgcatccttttggcgtga tatccatatg aaattcatgg 11820 ctttttcttt ttttgcatat taaagataagacttcctcta ccaccacacc aaatgactac 11880 tacacactgc tcatttgaga actgtcagctgagtggggca ggcttgagtt ttcatttcat 11940 atatctatat gtctataagt atataaatactatagttata tagataaaga gatacgaatt 12000 tctatagact gactttttcc attttttaaatgttcatgtc acatcctaat agaaagaaat 12060 tacttctagt cagtcatcca ggcttacctgcttggtctag aatggatttt tcccggagcc 12120 ggaagccagg aggaaactac accacactaaaacattgtct acagctccag atgtttctca 12180 ttttaaacaa ctttccactg acaacgaaagtaaagtaaag tattggattt ttttaaaggg 12240 aacatgtgaa tgaatacaca ggacttattatatcagagtg agtaatcggt tggttggttg 12300 attgattgat tgattgatac attcagcttcctgctgctag caatgccacg atttagattt 12360 aatgatgctt cagtggaaat caatcagaaggtattctgac cttgtgaaca tcagaaggta 12420 ttttttaact cccaagcagt agcaggacgatgatagggct ggagggctat ggattcccag 12480 cccatccctg tgaaggagta ggccactctttaagtgaagg attggatgat tgttcataat 12540 acataaagtt ctctgtaatt acaactaaattattatgccc tcttctcaca gtcaaaagga 12600 actgggtggt ttggtttttg ttgcttttttagatttattg tcccatgtgg gatgagtttt 12660 taaatgccac aagacataat ttaaaataaataaactttgg gaaaaggtgt aagacagtag 12720 ccccatcaca tttgtgatac tgacaggtatcaacccagaa gcccatgaac tgtgtttcca 12780 tcctttgcat ttctctgcga gtagttccacacaggtttgt aagtaagtaa gaaagaaggc 12840 aaattgattc aaatgttaca aaaaaacccttcttggtgga ttagacaggt taaatatata 12900 aacaaacaaa caaaaattgc tcaaaaaagaggagaaaagc tcaagaggaa aagctaagga 12960 ctggtaggaa aaagctttac tctttcatgccattttattt ctttttgatt tttaaatcat 13020 tcattcaata gataccaccg tgtgacctataattttgcaa atctgttacc tctgacatca 13080 agtgtaatta gcttttggag agtgggctgacatcaagtgt aattagcttt tggagagtgg 13140 gttttgtcca ttattaataa ttaattaattaacatcaaac acggcttctc atgctatttc 13200 tacctcactt tggttttggg gtgttcctgataattgtgca cacctgagtt cacagcttca 13260 ccacttgtcc attgcgttat tttctttttcctttataatt ctttcttttt ccttcataat 13320 tttcaaaaga aaacccaaag ctctaaggtaacaaattacc aaattacatg aagatttggt 13380 ttttgtcttg catttttttc ctttatgtgacgctggacct tttctttacc caaggatttt 13440 taaaactcag atttaaaaca aggggttactttacatccta ctaagaagtt taagtaagta 13500 agtttcattc taaaatcaga ggtaaatagagtgcataaat aattttgttt taatcttttt 13560 gtttttcttt tagacacatt agctctggagtgagtctgtc ataatatttg aacaaaaatt 13620 gagagcttta ttgctgcatt ttaagcataattaatttgga cattatttcg tgttgtgttc 13680 tttataacca ccgagtatta aactgtaaatcataatgtaa ctgaagcata aacatcacat 13740 ggcatgtttt gtcattgttt tcaggtactgagttcttact tgagtatcat aatatattgt 13800 gttttaacac caacactgta acatttacgaattatttttt taaacttcag ttttactgca 13860 ttttcacaac atatcagact tcaccaaatatatgccttac tattgtatta tagtactgct 13920 ttactgtgta tctcaataaa gcacgcagttatgttac 13957 2 13815 DNA Mus musculus 2 cctcactcac ttgccccttacaggactcag ctcttgaagg caatagcttt atagaaaaaa 60 cgaataggaa gacttgaagtgctatttttt tttttttttt tgtcaaggct gctgaagttt 120 attggcttct catcgtacctaagcctcctg gagcaataaa actgggagaa acttttacca 180 agatttttat ccctgccttgatatatactt tttcttccaa atgctttggt gggaagaagt 240 agaggactgt tatgaaagagaagatgttca aaagaaaaca ttcacaaaat ggataaatgc 300 acaattttct aagtttggaaagcaacacat agacaacctc ttcagtgacc tgcaggatgg 360 aaaacgcctc ctagacctcttggaaggcct tacagggcaa aaactgccaa aagaaaaggg 420 atctacaaga gttcatgccctgaacaatgt caacaaggca ctgcgggtct tacagaaaaa 480 taatgttgat ttagtgaatataggaagcac tgacatagtg gatggaaatc ataaactcac 540 tcttggtttg atttggaatataatcctcca ctggcaggtc aaaaatgtga tgaaaactat 600 catggctgga ttgcagcaaaccaacagtga aaagattctt ctgagctggg ttcgacagtc 660 aacacgtaat tatccacaggttaacgtcat caacttcacc tctagctggt ccgacgggtt 720 ggctttgaat gctcttatccatagtcacag gcccgacctg tttgattgga atagtgtggt 780 ttcacagcac tcagccacccaaagactgga acatgccttc aacattgcaa aatgccagtt 840 aggcatagaa aaacttcttgatcctgaaga tgttgctacc acttatccag acaagaagtc 900 catcttaatg tacatcacatcactctttca agttttgcca caacaagtga gcattgaagc 960 cattcaagaa gtggaaatgttgcccaggac atcttcaaaa gtaactagag aagaacattt 1020 tcaattacat caccagatgcattactctca acagatcaca gtcagtctag cacagggcta 1080 tgaacaaact tcttcatctcctaagcctcg attcaagagt tatgccttca cacaggctgc 1140 ttatgttgcc acctctgattccacacagag cccctatcct tcacagcatt tggaagctcc 1200 cagagacaag tcacttgacagttcattgat ggagacggaa gtaaatctgg atagttacca 1260 aactgcttta gaagaagtactttcatggct tctttctgcc gaggatacat tgcgagcaca 1320 aggagagatt tcaaatgatgttgaagaagt gaaagaacag tttcatgctc atgagggatt 1380 catgatggat ctgacatctcatcaaggact tgttggtaat gttctacagt taggaagtca 1440 actagttgga aaagggaaattatcagaaga tgaagaagct gaagtgcaag aacaaatgaa 1500 tctcctaaat tcaagatgggaatgtctcag ggtagctagc atggaaaaac aaagcaaatt 1560 acacaaagtt ctaatggatctccagaatca gaaattaaaa gaactagatg actggttaac 1620 aaaaactgaa gagagaactaagaaaatgga ggaagagccc tttggacctg atcttgaaga 1680 tctaaaatgc caagtacaacaacataaggt gcttcaagaa gatctagaac aggagcaggt 1740 cagggtcaac tcgctcactcacatggtagt agtggttgat gaatccagcg gtgatcatgc 1800 aacagctgct ttggaagaacaacttaaggt actgggagat cgatgggcaa atatctgcag 1860 atggactgaa gaccgctggattgttttaca agatattctt ctaaaatggc agcattttac 1920 tgaagaacag tgcctttttagtacatggct ttcagaaaaa gaagatgcaa tgaagaacat 1980 tcagacaagt ggctttaaagatcaaaatga aatgatgtca agtcttcaca aaatatctac 2040 tttaaaaata gatctagaaaagaaaaagcc aaccatggaa aaactaagtt cactcaatca 2100 agatctactt tcggcactgaaaaataagtc agtgactcaa aagatggaaa tctggatgga 2160 aaactttgca caacgttgggacaatttaac ccaaaaactt gaaaagagtt cagcacaaat 2220 ttcacaggct gtcaccaccactcaaccatc cctaacacag acaactgtaa tggaaacggt 2280 aactatggtg accacaagggaacaaatcat ggtaaaacat gcccaagagg aacttccacc 2340 accacctcct caaaagaagaggcagataac tgtggattct gaactcagga aaaggttgga 2400 tgtcgatata actgaacttcacagttggat tactcgttca gaagctgtat tacagagttc 2460 tgaatttgca gtctatcgaaaagaaggcaa catctcagac ttgcaagaaa aagtcaatgc 2520 catagcacga gaaaaagcagagaagttcag aaaactgcaa gatgccagca gatcagctca 2580 ggccctggtg gaacagatggcaaatgaggg tgttaatgct gaaagtatca gacaagcttc 2640 agaacaactg aacagccggtggacagaatt ctgccaattg ctgagtgaga gagttaactg 2700 gctagagtat caaaccaacatcattacctt ttataatcag ctacaacaat tggaacagat 2760 gacaactact gccgaaaacttgttgaaaac ccagtctacc accctatcag agccaacagc 2820 aattaaaagc cagttaaaaatttgtaagga tgaagtcaac agattgtcag ctcttcagcc 2880 tcaaattgag caattaaaaattcagagtct acaactgaaa gaaaagggac aggggccaat 2940 gtttctggat gcagactttgtggcctttac taatcatttt aaccacatct ttgatggtgt 3000 gagggccaaa gagaaagagctacagacaat ttttgacact ttaccaccaa tgcgctatca 3060 ggagacaatg agtagcatcaggacgtggat ccagcagtca gaaagcaaac tctctgtacc 3120 ttatcttagt gttactgaatatgaaataat ggaggagaga ctcgggaaat tacaggctct 3180 gcaaagttct ttgaaagagcaacaaaatgg cttcaactat ctgagtgaca ctgtgaagga 3240 gatggccaag aaagcaccttcagaaatatg ccagaaatat ctgtcagaat ttgaagagat 3300 tgaggggcac tggaagaaactttcctccca gttggtggaa agctgccaaa agctagaaga 3360 acatatgaat aaacttcgaaaatttcagaa tcacataaaa accttacaga aatggatggc 3420 tgaagttgat gttttcctgaaagaggaatg gcctgccctg ggggatgctg aaatcctgaa 3480 aaaacagctc aaacaatgcagacttttagt tggtgatatt caaacaattc agcccagttt 3540 aaatagtgtt aatgaaggtgggcagaagat aaagagtgaa gctgaacttg agtttgcatc 3600 cagactggag acagaacttagagagcttaa cactcagtgg gatcacatat gccgccaggt 3660 ctacaccaga aaggaagccttaaaggcagg tttggataaa accgtaagcc tccaaaaaga 3720 tctatcagag atgcatgagtggatgacaca agctgaagaa gaatatctag agagagattt 3780 tgaatataaa actccagatgaattacagac tgctgttgaa gaaatgaaga gagctaaaga 3840 agaggcacta caaaaagaaactaaagtgaa actccttact gagactgtaa atagtgtaat 3900 agctcacgct ccaccctcagcacaagaggc cttaaaaaag gaacttgaaa ctctgaccac 3960 caactaccaa tggctgtgcaccaggctgaa tggaaaatgc aaaactttgg aagaagtttg 4020 ggcatgttgg catgagttattgtcatattt agagaaagca aacaagtggc tcaatgaagt 4080 agaattgaaa cttaaaaccatggaaaatgt tcctgcagga cctgaggaaa tcactgaagt 4140 gctagaatct cttgaaaatctgatgcatca ttcagaggag aacccaaatc agattcgtct 4200 attggcacag actcttacagatggaggagt catggatgaa ctgatcaatg aggagcttga 4260 gacgtttaat tctcgttggagggaactaca tgaagaggct gtgaggaaac aaaagttgct 4320 tgaacagagt atccagtctgcccaggaaat tgaaaagtcc ttgcacttaa ttcaggagtc 4380 gcttgaattc attgacaagcagttggcagc ttatatcact gacaaggtgg atgcagctca 4440 aatgcctcag gaagcccagaaaatccaatc agatttgaca agtcatgaga taagtttaga 4500 agaaatgaag aaacataaccaggggaagga tgccaaccaa agggttcttt cacaaattga 4560 tgttgcacag aaaaaattacaagatgtctc catgaaattt cgattattcc aaaaaccagc 4620 caattttgaa caacgtctagaggaaagtaa gatgatttta gatgaagtca agatgcattt 4680 gcctgcattg gaaaccaagagtgttgaaca ggaagtaatt cagtcacaac taagtcattg 4740 tgtgaacttg tataaaagcctgagtgaagt caagtctgaa gtggaaatgg tgattaaaac 4800 cggacgtcaa attgtacagaaaaagcagac agaaaatccc aaagagcttg atgaacgagt 4860 aacagctttg aaattgcattacaatgagtt gggtgcgaag gtaacagaga gaaagcaaca 4920 gttggagaaa tgcttgaagttgtcccgtaa gatgagaaag gaaatgaatg tcttaacaga 4980 atggctggca gcaacagatacagaattgac gaagagatca gcagttgaag gaatgccaag 5040 taatttggat tctgaagttgcctggggaaa ggctactcaa aaagagattg agaaacagaa 5100 ggctcacttg aagagtgttacagaattagg agagtctttg aaaatggtgt tgggcaagaa 5160 agaaaccttg gtagaagataaactgagtct tctgaacagt aactggatag ctgtcacctc 5220 cagagtagaa gaatggctaaatcttttgtt ggaataccag aaacacatgg aaacctttga 5280 tcagaacata gaacaaatcacaaagtggat cattcatgca gatgaacttt tagatgagtc 5340 tgaaaagaag aaaccacaacaaaaggaaga cattcttaag cgtttaaagg ctgaaatgaa 5400 tgacatgcgc ccaaaggtggactccacacg tgaccaagca gcaaaattga tggcaaaccg 5460 cggtgaccac tgcaggaaagtagtagagcc ccaaatctct gagctcaacc gtcgatttgc 5520 agctatttct cacagaattaagactggaaa ggcctccatt cctttgaagg aattggagca 5580 gtttaactca gatatacaaaaattgcttga accactggag gctgaaattc agcagggggt 5640 gaatctgaaa gaggaagacttcaataaaga tatgagtgaa gacaatgagg gtactgtaaa 5700 tgaattgttg caaagaggagacaacttaca acaaagaatc acagatgaga gaaagcgaga 5760 ggaaataaag ataaaacagcagctgttaca gacaaaacat aatgctctca aggatttgag 5820 gtctcaaaga agaaaaaaggccctagaaat ttctcaccag tggtatcagt acaagaggca 5880 ggctgatgat ctcctgaaatgcttggatga aattgaaaaa aaattagcca gcctacctga 5940 acccagagat gaaagaaaattaaaggaaat tgatcgtgaa ttgcagaaga agaaagagga 6000 gctgaatgca gtgcgcaggcaagctgaggg cttgtctgag aatggggccg caatggcagt 6060 ggagccaact cagatccagctcagcaagcg ctggcggcaa attgagagca attttgctca 6120 gtttcgaaga ctcaactttgcacaaattca cactctccat gaagaaacta tggtagtgac 6180 gactgaagat atgcctttggatgtttctta tgtgccttct acttatttga ccgagatcag 6240 tcatatctta caagctctttcagaagttga tcatcttcta aatactcctg aactctgtgc 6300 taaagatttt gaagatctttttaagcaaga ggagtctctt aagaatataa aagacaattt 6360 gcaacaaatc tcaggtcggattgatattat tcacaagaag aagacagcag ccttgcaaag 6420 tgccacctcc atggaaaaggtgaaagtaca ggaagccgtg gcacagatgg atttccaggg 6480 ggaaaaactt catagaatgtacaaggaacg acaagggcga ttcgacagat cagttgaaaa 6540 atggcgacac tttcattatgatatgaaggt atttaatcaa tggctgaatg aagttgaaca 6600 gtttttcaaa aagacacaaaatcctgaaaa ctgggaacat gctaaataca aatggtatct 6660 taaggaactc caggatggcattgggcagcg tcaagctgtt gtcagaacac tgaatgcaac 6720 tggggaagaa ataattcaacagtcttcaaa aacagatgtc aatattctac aagaaaaatt 6780 aggaagcttg agtctgcggtggcacgacat ctgcaaagag ctggcagaaa ggagaaagag 6840 gattgaagaa caaaagaatgtcttgtcaga atttcaaaga gatttaaatg aatttgtttt 6900 gtggctggaa gaagcagataacattgctat tactccactt ggagatgagc agcagctaaa 6960 agaacaactt gaacaagtcaagttactggc agaagagttg cccctgcgcc agggaattct 7020 aaaacaatta aatgaaacaggaggagcagt acttgtaagt gctcccataa ggccagaaga 7080 gcaagataaa cttgaaaagaagctcaaaca gacaaatctc cagtggataa aggtctccag 7140 agctttacct gagaaacaaggagagcttga ggttcactta aaagatttta ggcagcttga 7200 agagcagctg gatcacctgcttctgtggct ctctcctatt agaaaccagt tggaaattta 7260 taaccaacca agtcaggcaggaccgtttga cataaaggag attgaagtaa cagttcacgg 7320 taaacaagcg gatgtggaaaggcttttgtc gaaagggcag catttgtata aggaaaaacc 7380 aagcactcag ccagtgaagaggaagttaga agatctgagg tctgagtggg aggctgtaaa 7440 ccatttactt cgggagctgaggacaaagca gcctgaccgt gcccctggac tgagcactac 7500 tggagcctct gccagtcagactgttactct agtgacacaa tctgtggtta ctaaggaaac 7560 tgtcatctcc aaactagaaatgccatcttc tttgctgttg gaggtacctg cactggcaga 7620 cttcaaccga gcttggacagaacttacaga ctggctgtct ctgcttgatc gagttataaa 7680 atcacagaga gtgatggtgggtgatctgga agacatcaat gaaatgatca tcaaacagaa 7740 ggcaacactg caagatttggaacagagacg cccccaattg gaagaactca ttactgctgc 7800 ccagaatttg aaaaacaaaaccagcaatca agaagctaga acaatcatta ctgatcgaat 7860 tgaaagaatt cagattcagtgggatgaggt tcaagaacag ctgcagaaca ggagacaaca 7920 gttgaatgaa atgttaaaggattcaacaca atggctggaa gctaaggaag aagccgaaca 7980 ggtcatagga caggtcagaggcaagcttga ctcatggaaa gaaggtcctc acacagtaga 8040 tgcaatccaa aagaagatcacagaaaccaa gcagttggcc aaagacctcc gtcaacggca 8100 gataagtgta gacgtggcaaatgatttggc actgaaactt cttcgggact attctgctga 8160 tgataccaga aaagtacacatgataacaga gaatatcaat acttcttggg gaaacattca 8220 taaaagagta agtgagcaagaggctgcttt ggaagaaact catagattac tgcagcagtt 8280 ccctctggac ctggagaagtttctttcctg gattacggaa gcagaaacaa ctgccaatgt 8340 cctacaggac gcttcccgtaaggagaagct cctagaagac tccaggggag tcagagagct 8400 gatgaaacca tggcaagatctccaaggaga aattgaaact cacacagata tctatcacaa 8460 tcttgatgaa aatggccaaaaaatcctgag atccctggaa ggttcggatg aagcacccct 8520 gttacaaaga cgtttggataacatgaattt caagtggagt gaacttcaga aaaagtctct 8580 caacattagg tcccatttggaagcaagttc tgaccagtgg aagcgtttgc atctttctct 8640 tcaggaactt cttgtttggctacagctgaa agatgatgaa ctgagccgtc aggcacccat 8700 cggtggtgat ttcccagcagttcagaagca gaatgatata catagggcct tcaagaggga 8760 attgaaaact aaagaacctgtaatcatgag tactctggag actgtgagaa tatttctgac 8820 agagcagcct ttggaaggactagagaaact ctaccaggag cccagagaac tgcctcctga 8880 agaaagagct cagaatgtcactcggctcct acgaaagcag gctgaagagg tcaacgctga 8940 atgggacaaa ttgaacctgcgctcagctga ttggcagaga aaaatagatg aagctcttga 9000 aagactccag gaacttcaggaagctgccga tgaactggac ctcaagttgc gccaagctga 9060 ggtgatcaag ggatcctggcagccagtggg ggatctcctc attgactctc tgcaagatca 9120 ccttgaaaaa gtcaaggcacttcggggaga aattgcacct cttaaagaga atgtcaatcg 9180 tgtcaatgac cttgcacatcagctgaccac actgggcatt cagctctcac cttataacct 9240 cagcactttg gaagatctgaataccagatg gaggcttcta caggtggctg tggaggaccg 9300 tgtcagacag ctgcatgaagcccacaggga ctttggtcct gcatcccagc acttcctttc 9360 cacttcagtt cagggtccctgggagagagc catctcacca aacaaagtgc cctactatat 9420 caaccacgag acccaaaccacttgttggga ccaccccaaa atgacagagc tctaccagtc 9480 tttagctgac ctgaataatgtcaggttctc cgcgtatagg actgccatga agctcagaag 9540 gctccagaag gccctttgcttggatctctt gagcctgtca gctgcatgtg atgccctgga 9600 ccagcacaac ctcaagcaaaatgaccagcc catggatatc ctgcagataa ttaactgttt 9660 gactacaatt tatgatcgtctggagcaaga gcacaacaat ctggtcaatg tccctctctg 9720 tgtggatatg tgtctcaactggcttctcaa tgtttatgat acgggacgaa cagggaggat 9780 ccgtgtcctg tcttttaaaactggcatcat ttctctgtgt aaagcacact tggaagacaa 9840 gtacagatac cttttcaagcaagtggcaag ttcaactggc ttttgtgacc agcgtaggct 9900 gggtcttctt ctgcatgattctattcaaat cccaagacag ttgggtgaag ttgcttcctt 9960 tgggggcagt aacattgagccgagtgtcag gagctgcttc caatttgcca ataataaacc 10020 tgagattgaa gctgctctcttccttgactg gatgcgcctg gaaccccagt ctatggtgtg 10080 gctgcccgtc ttgcacagagtggctgctgc tgaaactgcc aagcatcaag ccaagtgtaa 10140 catctgtaag gagtgtccaatcattggatt caggtacaga agcctaaagc attttaatta 10200 tgacatctgc caaagttgctttttttctgg ccgagttgca aagggccata aaatgcacta 10260 ccccatggta gagtattgcactccgactac atccggagaa gatgttcgcg acttcgccaa 10320 ggtactaaaa aacaaatttcgaaccaaaag gtattttgcg aagcatcccc gaatgggcta 10380 cctgccagtg cagactgtgttagaggggga caacatggaa actcccgtta ctctgatcaa 10440 cttctggcca gtagattctgcgcctgcctc gtccccccag ctttcacacg atgatactca 10500 ttcacgcatt gaacattatgctagcaggct agcagaaatg gaaaacagca atggatctta 10560 tctaaatgat agcatctctcctaatgagag catagatgat gaacatttgt taatccagca 10620 ttactgccaa agtttgaaccaggactcccc cctgagccag cctcgtagtc ctgcccagat 10680 cttgatttcc ttagagagtgaggaaagagg ggagctagag agaatcctag cagatcttga 10740 ggaagaaaac aggaatctgcaagcagaata tgatcgcctg aagcagcagc atgagcataa 10800 aggcctgtct ccactgccatctcctcctga gatgatgccc acctctcctc agagtcccag 10860 ggatgctgag ctcattgctgaggctaagct actgcgccaa cacaaaggac gcctggaagc 10920 caggatgcaa atcctggaagaccacaataa acagctggag tctcagttac atagactgag 10980 acagctcctg gagcagccccaggctgaagc taaggtgaat ggcaccacgg tgtcctctcc 11040 ttccacctct ctgcagaggtcagatagcag tcagcctatg ctgctccgag tggttggcag 11100 tcaaacttca gaatctatgggtgaggaaga tcttctgagt cctccccagg acacaagcac 11160 agggttagaa gaagtgatggagcaactcaa caactccttc cctagttcaa gaggaagaaa 11220 tgcccccgga aagccaatgagagaggacac aatgtaggaa gccttttcca catggcagat 11280 gatttgggca gagcgatggagtccttagtt tcagtcatga cagatgaaga aggagcagaa 11340 taaatgtttt acaactcctgattcccgcat ggtttttata atattcgtac aacaaagagg 11400 attagacagt aagagtttacaagaaataaa atctatattt ttgtgaaggg tagtggtact 11460 atactgtaga tttcagtagtttctaagtct gttattgttt tgttaacaat ggcaggtttt 11520 acacgtctat gcaattgtacaaaaaagtta aaagaaaaca tgtaaaatct tgatagctaa 11580 ataacttgcc atttctttatatggaacgca ttttgggttg tttaaaaatt tataacagtt 11640 ataaagagag attgtaaactaaagtgtgct ttataaaaaa agttgtttat aaaaacccct 11700 aaacaaacac acacgcacacacacacacac acacacacac acacacacac gcacacatac 11760 atgcacgaac ccaccacacacacacacaca cacacacaca ctgaggcagc acattgtttt 11820 gcattacttt agcgtggtattcatatggaa ttcatgacgt ttttttattt tcttgcatac 11880 gaaccccacc aaatgactgcttcatattgc tcttttgaga attgttgact gagtggggct 11940 ggctatgggc tttcattttatacatctata tgtctacaag tatataaata ctataggtat 12000 atagataaat agatatgaagttacttcttc aaatgttctt gccacttcct aatggaaatt 12060 gcttctagtc atctgggcttatctgcttgg gcaagagtga attttccctg gagcccaaag 12120 ccaggagact accgccacactaaaatattg tctagggctc cagatgtttc tagttttaaa 12180 ctttccactg agagctagaggattcatttt tttcaaggaa catgcgaatg aatacacagg 12240 acttactatc atagtaatttgttggctgat atattcaact tcctactgtt gggttatatt 12300 taatgatgtt tctgcaatagaacatcagat gacattttta actcccagac agtaggagga 12360 agatggtagg agctaaaggttgcggctcct cagtcaattt atatgagggg agcaacaact 12420 ctgtaaaaga atggatgaatatttacaact atacatataa acatctctat aattacaact 12480 aaattgttct gccctcttcataaactcaac ctgaagtggg tggttttgtt gttgttgttg 12540 ttgttgttgt tgatgatgatgatgaatttt agattttaga ttttttgggt ttttttttct 12600 tcattgtgat gatttttttttttaatgctg caagacttag gattactgtt aagaaagtaa 12660 cccaatcaca ttgtgaccctggtgaatatc agtccagaag cccatgaact gcatttgtct 12720 cctttgcatt ggtttccctgcaagtaactc cacacaggat tgtgggtgag aaggcacagt 12780 ggttggaaag ttttgagagcaaaagcgtct ccaaactctc tggtctagtt gacgggctga 12840 aatgtctaaa caaatgcaagtcattgaacc aggagaaaaa gtgcaacaga aagctaagga 12900 ctgctaggaa gagctttactcctctcatgc cagtttcttc ttcttagcat ttaaagagca 12960 ttctctcaat agaaatcactgtcctatcat tttgcaaatc tgttacctct aacgtcaagt 13020 gtaattaact tctagcgagtgggttttgtc cattattaat tgtaattaac atcaaacaca 13080 gcttctcatg ctatttctacctcactttgg ttttggggtg tttctagtaa ttgtgcacac 13140 ctaatttcac aacttcaccacttgtctgtt gtgtggacac cagtttcctt ttttcattta 13200 taatttccaa aagaaaacccaaagctctaa gataacaaat tgaaatttgg ttctggtctt 13260 gcttttctct ctctctctcctttatgtggc actgggcatt ttctttatcc aaggatttgt 13320 tttcaccaag atttaaaacaaggggttcct ttcctactaa gaagttttaa gtttcattct 13380 aaaatccaag gtagatagagtgcatagttt tgttttaatc ttttcgtttt atcttttaga 13440 tattagttct ggagtgaatctatcaaaata tttgaataaa aactgagagc tttattgctg 13500 attttaagca taatttggacatcatttcat gttctttata accatcaagt attaaagtgt 13560 aaatcataat cagtgtaactgaagcataat catcacatgg catgtatcat cattgtctcc 13620 aggtactgga ctcttacttgagtatcataa tagattgtgt tttaacacca acactgtaac 13680 atttactaat tatttttttaaacttcagtt ttactgcatt ttcacaacat atcagatttc 13740 accaaatata tgccttactattgtattata ttactgcttt actgtgtatc tcaataaagc 13800 acgcagttat gttac 138153 10302 DNA Homo sapiens 3 atggccaagt atggagaaca tgaagccagt cctgacaatgggcagaacga attcagtgat 60 atcattaagt ccagatctga tgaacacaat gacgtacagaagaaaacctt taccaaatgg 120 ataaatgctc gattttcaaa gagtgggaaa ccacccatcaatgatatgtt cacagacctc 180 aaagatggaa ggaagctatt ggatcttcta gaaggcctcacaggaacatc actgccaaag 240 gaacgtggtt ccacaagggt acatgcctta aataacgtcaacagagtgct gcaggtttta 300 catcagaaca atgtggaatt agtgaatata gggggaactgacattgtgga tggaaatcac 360 aaactgactt tggggttact ttggagcatc attttgcactggcaggtgaa agatgtcatg 420 aaggatgtca tgtcggacct gcagcagacg aacagtgagaagatcctgct cagctgggtg 480 cgtcagacca ccaggcccta cagccaagtc aacgtcctcaacttcaccac cagctggaca 540 gatggactcg cctttaatgc tgtcctccac cgacataaacctgatctctt cagctgggat 600 aaagttgtca aaatgtcacc aattgagaga cttgaacatgccttcagcaa ggctcaaact 660 tatttgggaa ttgaaaagct gttagatcct gaagatgttgccgttcggct tcctgacaag 720 aaatccataa ttatgtattt aacatctttg tttgaggtgctacctcagca agtcaccata 780 gacgccatcc gtgaggtaga gacactccca aggaaatataaaaaagaatg tgaagaagag 840 gcaattaata tacagagtac agcgcctgag gaggagcatgagagtccccg agctgaaact 900 cccagcactg tcactgaggt cgacatggat ctggacagctatcagattgc gttggaggaa 960 gtgctgacct ggttgctttc tgctgaggac actttccaggagcaggatga tatttctgat 1020 gatgttgaag aagtcaaaga ccagtttgca acccatgaagcttttatgat ggaactgact 1080 gcacaccaga gcagtgtggg cagcgtcctg caggcaggcaaccaactgat aacacaagga 1140 actctgtcag acgaagaaga atttgagatt caggaacagatgaccctgct gaatgctaga 1200 tgggaggctc ttagggtgga gagtatggac agacagtcccggctgcacga tgtgctgatg 1260 gaactgcaga agaagcaact gcagcagctc tccgcctggttaacactcac agaggagcgc 1320 attcagaaga tggaaacttg ccccctggat gatgatgtaaaatctctaca aaagctgcta 1380 gaagaacata aaagtttgca aagtgatctt gaggctgaacaggtgaaagt aaattcacta 1440 actcacatgg tggtcattgt tgatgaaaac agtggtgagagcgctacagc tatcctagaa 1500 gaccagttac agaaacttgg tgagcgctgg acagcagtatgccgttggac tgaagaacgc 1560 tggaataggt tacaagaaat caatatattg tggcaggaattattggaaga acagtgcttg 1620 ttgaaagctt ggttaaccga aaaagaagag gctttaaataaagtccagac aagcaacttc 1680 aaagaccaaa aggaactaag tgtcagtgtt cgacgtctggctattttgaa ggaagacatg 1740 gaaatgaagc gtcaaacatt ggatcagctg agtgagattggccaggatgt gggacaatta 1800 cttgataatt ccaaggcatc taagaagatc aacagtgactcagaggaact gactcaaaga 1860 tgggattctt tggttcagag actagaagat tcctccaaccaggtgactca ggctgtagca 1920 aagctgggga tgtctcagat tcctcagaag gaccttttggagactgttcg tgtaagagaa 1980 caagcaatta caaaaaaatc taagcaggaa ctgcctcctcctcctccccc aaagaagaga 2040 cagatccatg tggatattga agctaagaaa aagtttgatgctataagtgc agagctgttg 2100 aactggattt tgaaatggaa aactgccatt cagaccacagagataaaaga gtatatgaag 2160 atgcaagaca cttccgaaat gaaaaagaag ttgaaggcattagaaaaaga acagagagaa 2220 agaatcccca gagcagatga attaaaccaa actggacaaatccttgtgga gcaaatggga 2280 aaagaaggcc ttcctactga agaaataaaa aatgttctggagaaggtttc atcagaatgg 2340 aagaatgtat ctcaacattt ggaagatcta gaaagaaagattcagctaca ggaagatata 2400 aatgcttatt tcaagcagct tgatgagctt gaaaaggtcatcaagacaaa ggaggagtgg 2460 gtaaaacaca cttccatttc tgaatcttcc cggcagtccttgccaagctt gaaggattcc 2520 tgtcagcggg aattgacaaa tcttcttggc cttcaccccaaaattgaaat ggctcgtgca 2580 agctgctcgg ccctgatgtc tcagccttct gccccagattttgtccagcg gggcttcgat 2640 agctttctgg gccgctacca agctgtacaa gaggctgtagaggatcgtca acaacatcta 2700 gagaatgaac tgaagggcca acctggacat gcatatctggaaacattgaa aacactgaaa 2760 gatgtgctaa atgattcaga aaataaggcc caggtgtctctgaatgtcct taatgatctt 2820 gccaaggtgg agaaggccct gcaagaaaaa aagacccttgatgaaatcct tgagaatcag 2880 aaacctgcat tacataaact tgcagaagaa acaaaggctctggagaaaaa tgttcatcct 2940 gatgtagaaa aattatataa gcaagaattt gatgatgtgcaaggaaagtg gaacaagcta 3000 aaggtcttgg tttccaaaga tctacatttg cttgaggaaattgctctcac actcagagct 3060 tttgaggccg attcaacagt cattgagaag tggatggatggcgtgaaaga cttcttaatg 3120 aaacagcagg ctgcccaagg agacgacgca ggtctacagaggcagttaga ccagtgctct 3180 gcatttgtta atgaaataga aacaattgaa tcatctctgaaaaacatgaa ggaaatagag 3240 actaatcttc gaagtggtcc agttgctgga ataaaaacttgggtgcagac aagactaggt 3300 gactaccaaa ctcaactgga gaaacttagc aaggagatcgctactcaaaa aagtaggttg 3360 tctgaaagtc aagaaaaagc tgcgaacctg aagaaagacttggcagagat gcaggaatgg 3420 atgacccagg ccgaggaaga atatttggag cgggattttgagtacaagtc accagaagag 3480 cttgagagtg ctgtggaaga gatgaagagg gcaaaagaggatgtgttgca gaaggaggtg 3540 agagtgaaga ttctcaagga caacatcaag ttattagctgccaaggtgcc ctctggtggc 3600 caggagttga cgtctgagct gaatgttgtg ctggagaattaccaacttct ttgtaataga 3660 attcgaggaa agtgccacac gctagaggag gtctggtcttgttggattga actgcttcac 3720 tatttggatc ttgaaactac ctggttaaac actttggaagagcggatgaa gagcacagag 3780 gtcctgcctg agaagacgga tgctgtcaac gaagccctggagtctctgga atctgttctg 3840 cgccacccgg cagataatcg cacccagatt cgagagcttggccagactct gattgatggg 3900 gggatcctgg atgatataat cagtgagaaa ctggaggctttcaacagccg atatgaagat 3960 ctaagtcacc tggcagagag caagcagatt tctttggaaaagcaactcca ggtgctgcgg 4020 gaaactgacc agatgcttca agtcttgcaa gagagcttgggggagctgga caaacagctc 4080 accacatacc tgactgacag gatagatgct ttccaagttccacaggaagc tcagaaaatc 4140 caagcagaga tctcagccca tgagctaacc ctagaggagttgagaagaaa tatgcgttct 4200 cagcccctga cctccccaga gagtaggact gccagaggaggaagtcagat ggatgtgcta 4260 cagaggaaac tccgagaggt gtccacaaag ttccagcttttccagaagcc agctaacttc 4320 gagcagcgca tgctggactg caagcgtgtg ctggatggcgtgaaagcaga acttcacgtt 4380 ctggatgtga aggacgtaga ccctgacgtc atacagacgcacctggacaa gtgtatgaaa 4440 ctgtataaaa ctttgagtga agtcaaactt gaagtggaaactgtgattaa aacaggaaga 4500 catattgtcc agaaacagca aacggacaac ccaaaagggatggatgagca gctgacttcc 4560 ctgaaggttc tttacaatga cctgggcgca caggtgacagaaggaaaaca ggatctggaa 4620 agagcatcac agttggcccg gaaaatgaag aaagaggctgcttctctctc tgaatggctt 4680 tctgctactg aaactgaatt ggtacagaag tccacttcagaaggtctgct tggtgacttg 4740 gatacagaaa tttcctgggc taaaaatgtt ctgaaggatctggaaaagag aaaagctgat 4800 ttaaatacca tcacagagag tagtgctgcc ctgcaaaacttgattgaggg cagtgagcct 4860 attttagaag agaggctctg cgtccttaac gctgggtggagccgagttcg tacctggact 4920 gaagattggt gcaatacctt gatgaaccat cagaaccagctagaaatatt tgatgggaac 4980 gtggctcaca taagtacctg gctttatcaa gctgaagctctattggatga aattgaaaag 5040 aaaccaacaa gtaaacagga agaaattgtg aagcgtttagtatctgagct ggatgatgcc 5100 aacctccagg ttgaaaatgt ccgcgatcaa gcccttattttgatgaatgc ccgtggaagc 5160 tcaagcaggg agcttgtaga accaaagtta gctgagctgaataggaactt tgaaaaggtg 5220 tctcaacata tcaaaagtgc caaattgcta attgctcaggaaccattata ccaatgtttg 5280 gtcaccactg aaacatttga aactggtgtg cctttctctgacttggaaaa attagaaaat 5340 gacatagaaa atatgttaaa atttgtggaa aaacacttggaatccagtga tgaagatgaa 5400 aagatggatg aggagagtgc ccagattgag gaagttctacaaagaggaga agaaatgtta 5460 catcaaccta tggaagataa taaaaaagaa aagatccgtttgcaattatt acttttgcat 5520 actagataca acaaaattaa ggcaatccct attcaacagaggaaaatggg tcaacttgct 5580 tctggaatta gatcatcact tcttcctaca gattatctggttgaaattaa caaaatttta 5640 ctttgcatgg atgatgttga attatcgctt aatgttccagagctcaacac tgctatttac 5700 gaagacttct cttttcagga agactctctg aagaatatcaaagaccaact ggacaaactt 5760 ggagagcaga ttgcagtcat tcatgaaaaa cagccagatgtcatccttga agcctctgga 5820 cctgaagcca ttcagatcag agatacactt actcagctgaatgcaaaatg ggacagaatt 5880 aatagaatgt acagtgatcg gaaaggttgt tttgacagggcaatggaaga atggagacag 5940 ttccattgtg accttaatga cctcacacag tggataacagaggctgaaga attactggtt 6000 gatacctgtg ctccaggtgg cagcctggac ttagagaaagccaggataca tcagcaggaa 6060 cttgaggtgg gcatcagcag ccaccagccc agttttgcagcactaaaccg aactggggat 6120 gggattgtgc agaaactctc ccaggcagat ggaagcttcttgaaagaaaa actggcaggt 6180 ttaaaccaac gctgggatgc aattgttgca gaagtgaaggataggcagcc aaggctaaaa 6240 ggagaaagta agcaggtgat gaagtacagg catcagctagatgagattat ctgttggtta 6300 acaaaggctg agcatgctat gcaaaagaga tcaaccaccgaattgggaga aaacctgcaa 6360 gaattaagag acttaactca agaaatggaa gtacatgctgaaaaactcaa atggctgaat 6420 agaactgaat tggagatgct ttcagataaa agtctgagtttacctgaaag ggataaaatt 6480 tcagaaagct taaggactgt aaatatgaca tggaataagatttgcagaga ggtgcctacc 6540 accctgaagg aatgcatcca ggagcccagt tctgtttcacagacaaggat tgctgctcat 6600 cctaatgtcc aaaaggtggt gctagtatca tctgcgtcagatattcctgt tcagtctcat 6660 cgtacttcgg aaatttcaat tcctgctgat cttgataaaactataacaga actagccgac 6720 tggctggtat taatcgacca gatgctgaag tccaacattgtcactgttgg ggatgtagaa 6780 gagatcaata agaccgtttc ccgaatgaaa attacaaaggctgacttaga acagcgccat 6840 cctcagctgg attatgtttt tacattggca cagaatttgaaaaataaagc ttccagttca 6900 gatatgagaa cagcaattac agaaaaattg gaaagggtcaagaaccagtg ggatggcacc 6960 cagcatggcg ttgagctaag acagcagcag cttgaggacatgattattga cagtcttcag 7020 tgggatgacc atagggagga gactgaagaa ctgatgagaaaatatgaggc tcgactctat 7080 attcttcagc aagcccgacg ggatccactc accaaacaaatttctgataa ccaaatactg 7140 cttcaagaac tgggtcctgg agatggtatc gtcatggcgttcgataacgt cctgcagaaa 7200 ctcctggagg aatatgggag tgatgacaca aggaatgtgaaagaaaccac agagtactta 7260 aaaacatcat ggatcaatct caaacaaagt attgctgacagacagaacgc cttggaggct 7320 gagtggagga cggtgcaggc ctctcgcaga gatctggaaaacttcctgaa gtggatccaa 7380 gaagcagaga ccacagtgaa tgtgcttgtg gatgcctctcatcgggagaa tgctcttcag 7440 gatagtatct tggccaggga actcaaacag cagatgcaggacatccaggc agaaattgat 7500 gcccacaatg acatatttaa aagcattgac ggaaacaggcagaagatggt aaaagctttg 7560 ggaaattctg aagaggctac tatgcttcaa catcgactggatgatatgaa ccaaagatgg 7620 aatgacttaa aagcaaaatc tgctagcatc agggcccatttggaggccag cgctgagaag 7680 tggaacaggt tgctgatgtc cttagaagaa ctgatcaaatggctgaatat gaaagatgaa 7740 gagcttaaga aacaaatgcc tattggagga gatgttccagccttacagct ccagtatgac 7800 cattgtaagg ccctgagacg ggagttaaag gagaaagaatattctgtcct gaatgctgtc 7860 gaccaggccc gagttttctt ggctgatcag ccaattgaggcccctgaaga gccaagaaga 7920 aacctacaat caaaaacaga attaactcct gaggagagagcccaaaagat tgccaaagcc 7980 atgcgcaaac agtcttctga agtcaaagaa aaatgggaaagtctaaatgc tgtaactagc 8040 aattggcaaa agcaagtgga caaggcattg gagaaactcagagacctgca gggagctatg 8100 gatgacctgg acgctgacat gaaggaggca gagtccgtgcggaatggctg gaagcccgtg 8160 ggagacttac tcattgactc gctgcaggat cacattgaaaaaatcatggc atttagagaa 8220 gaaattgcac caatcaactt taaagttaaa acggtgaatgatttatccag tcagctgtct 8280 ccacttgacc tgcatccctc tctaaagatg tctcgccagctagatgacct taatatgcga 8340 tggaaacttt tacaggtttc tgtggatgat cgccttaaacagcttcagga agcccacaga 8400 gattttggac catcctctca gcattttctc tctacgtcagtccagctgcc gtggcaaaga 8460 tccatttcac ataataaagt gccctattac atcaaccatcaaacacagac cacctgttgg 8520 gaccatccta aaatgaccga actctttcaa tcccttgctgacctgaataa tgtacgtttt 8580 tctgcctacc gtacagcaat caaaatccga agactacaaaaagcactatg tttggatctc 8640 ttagagttga gtacaacaaa tgaaattttc aaacagcacaagttgaacca aaatgaccag 8700 ctcctcagtg ttccagatgt catcaactgt ctgacaacaacttatgatgg acttgagcaa 8760 atgcataagg acctggtcaa cgttccactc tgtgttgatatgtgtctcaa ttggttgctc 8820 aatgtctatg acacgggtcg aactggaaaa attagagtgcagagtctgaa gattggatta 8880 atgtctctct ccaaaggtct cttggaagaa aaatacagatatctctttaa ggaagttgcg 8940 gggccgacag aaatgtgtga ccagaggcag ctgggcctgttacttcatga tgccatccag 9000 atcccccggc agctaggtga agtagcagct tttggaggcagtaatattga gcctagtgtt 9060 cgcagctgct tccaacagaa taacaataaa ccagaaataagtgtgaaaga gtttatagat 9120 tggatgcatt tggaaccaca gtccatggtt tggctcccagttttacatcg agtggcagca 9180 gcggagactg caaaacatca ggccaaatgc aacatctgtaaagaatgtcc aattgtcggg 9240 ttcaggtata gaagccttaa gcattttaac tatgatgtctgccagagttg tttcttttcg 9300 ggtcgaacag caaaaggtca caaattacat tacccaatggtggaatattg tatacctaca 9360 acatctgggg aagatgtacg agacttcaca aaggtacttaagaacaagtt caggtcgaag 9420 aagtactttg ccaaacaccc tcgacttggt tacctgcctgtccagacagt tcttgaaggt 9480 gacaacttag agactcctat cacactcatc agtatgtggccagagcacta tgacccctca 9540 caatctcctc aactgtttca tgatgacacc cattcaagaatagaacaata tgccacacga 9600 ctggcccaga tggaaaggac taatgggtct tttctcactgatagcagctc caccacagga 9660 agtgtggaag acgagcacgc cctcatccag cagtattgccaaacactcgg aggagagtcc 9720 ccagtgagcc agccgcagag cccagctcag atcctgaagtcagtagagag ggaagaacgt 9780 ggagaactgg agaggatcat tgctgacctg gaggaagaacaaagaaatct acaggtggag 9840 tatgagcagc tgaaggacca gcacctccga agggggctccctgtcggttc accgccagag 9900 tcgattatat ctccccatca cacgtctgag gattcagaacttatagcaga agcaaaactc 9960 ctcaggcagc acaaaggtcg gctggaggct aggatgcagattttagaaga tcacaataaa 10020 cagctggagt ctcagctcca ccgcctccga cagctgctggagcagcctga atctgattcc 10080 cgaatcaatg gtgtttcccc atgggcttct cctcagcattctgcactgag ctactcgctt 10140 gatccagatg cctccggccc acagttccac caggcagcgggagaggacct gctggcccca 10200 ccgcacgaca ccagcacgga tctcacggag gtcatggagcagattcacag cacgtttcca 10260 tcttgctgcc caaatgttcc cagcaggcca caggcaatgtga 10302 4 11096 DNA Mus musculus 4 atggccaagt atggggacct tgaagccaggcctgatgatg ggcagaacga attcagtgac 60 atcattaagt ccagatctga tgaacacaatgatgtacaga agaaaacctt taccaaatgg 120 ataaacgctc gattttccaa gagtgggaaaccacccatca gtgatatgtt ctcagacctc 180 aaagatggga gaaagctctt ggatcttctcgaaggcctca caggaacatc attgccaaag 240 gaacgtggtt ccacaagggt gcatgccttaaacaatgtca accgagtgct acaggtttta 300 catcagaaca atgtggactt ggtgaatattggaggcacgg acattgtggc tggaaatccc 360 aagctgactt tagggttact ctggagcatcattctgcact ggcaggtgaa ggatgtcatg 420 aaagatatca tgtcagacct gcagcagacaaacagcgaga agatcctgct gagctgggtg 480 cggcagacca ccaggcccta cagtcaagtcaacgtcctca acttcaccac cagctggacc 540 gatggactcg cgttcaacgc cgtgctccaccggcacaaac cagatctctt cgactgggac 600 gagatggtca aaatgtcccc aattgagagacttgaccatg cttttgacaa ggcccacact 660 tctttgggaa ttgaaaagct cctaagtcctgaaactgttg ctgtgcatct ccctgacaag 720 aaatccataa ttatgtattt aacgtctctgtttgaggtgc ttcctcagca agtcacgata 780 gatgccatcc gagaggtgga gactctcccaaggaagtata agaaagaatg tgaagaggaa 840 gaaattcata tccagagtgc agtgctggcagaggaaggcc agagtccccg agctgagacc 900 cctagcaccg tcactgaagt ggacatggatttggacagct accagatagc gctagaggaa 960 gtgctgacgt ggctgctgtc cgcggaggacacgttccagg agcaacatga catttctgat 1020 gatgtcgaag aagtcaaaga gcagtttgctacccatgaaa cttttatgat ggagctgaca 1080 gcacaccaga gcagcgtggg gagcgtcctgcaggctggca accagctgat gacacaaggg 1140 actctgtcca gagaggagga gtttgagatccaggaacaga tgaccttgct gaatgcaagg 1200 tgggaggcgc tccgggtgga gagcatggagaggcagtccc ggctgcacga cgctctgatg 1260 gagctgcaga agaaacagct gcagcagctctcaagctggc tggccctcac agaagagcgc 1320 attcagaaga tggagagcct cccgctgggtgatgacctgc cctccctgca gaagctgctt 1380 caagaacata aaagtttgca aaatgaccttgaagctgaac aggtgaaggt aaattcctta 1440 actcacatgg tggtgattgt ggatgaaaacagtggggaga gtgccacagc tcttctggaa 1500 gatcagttac agaaactggg tgagcgctggacagctgtat gccgctggac tgaagaacgt 1560 tggaacaggt tgcaagaaat cagtattctgtggcaggaat tattggaaga gcagtgtctg 1620 ttggaggctt ggctcaccga aaaggaagaggctttggata aagttcaaac cagcaacttt 1680 aaagaccaga aggaactaag tgtcagtgtccggcgtctgg ctatattgaa ggaagacatg 1740 gaaatgaaga ggcagactct ggatcaactgagtgagattg gccaggatgt gggccaatta 1800 ctcagtaatc ccaaggcatc taagaagatgaacagtgact ctgaggagct aacacagaga 1860 tgggattctc tggttcagag actcgaagactcttctaacc aggtgactca ggcggtagcg 1920 aagctcggca tgtcccagat tccacagaaggacctattgg agaccgttca tgtgagagaa 1980 caagggatgg tgaagaagcc caagcaggaactgcctcctc ctcccccacc aaagaagaga 2040 cagattcacg tggacgtgga ggccaagaaaaagtttgatg ctataagtac agagctgctg 2100 aactggattt tgaaatcaaa gactgccattcagaacacag agatgaaaga atataagaag 2160 tcgcaggaga cctcaggaat gaaaaagaaattgaagggat tagagaaaga acagaaggaa 2220 aatctgcccc gactggacga actgaatcaaaccggacaaa ccctccggga gcaaatggga 2280 aaagaaggcc ttccactgaa agaagtaaacgatgttctgg aaagggtttc gttggagtgg 2340 aagatgatat ctcagcagct agaagatctgggaaggaaga tccagctgca ggaagatata 2400 aatgcttatt ttaagcagct tgatgccattgaggagacca tcaaggagaa ggaagagtgg 2460 ctgaggggca cacccatttc tgaatcgccccggcagccct tgccaggctt aaaggattct 2520 tgccagaggg aactgacaga tctccttggccttcacccca gaattgagac gctgtgtgca 2580 agctgttcag ccctgaagtc tcagccctgtgtcccaggtt ttgtccagca gggttttgac 2640 gaccttcgac atcattacca ggctgttgcgaaggctttag aggaatacca acaacaacta 2700 gaaaatgagc tgaagagcca gcctggacccgagtatttgg acacactgaa taccctgaaa 2760 aaaatgctaa gcgagtcaga aaaggcggcccaggcctctc tgaatgccct gaacgatccc 2820 atagcggtgg agcaggccct gcaggagaaaaaggcccttg atgaaaccct tgagaatcag 2880 aaacatacgt tacataagct ttcagaagaaacgaagactt tggagaaaaa tatgcttcct 2940 gatgtgggga aaatgtataa acaagaatttgatgatgtcc aaggcagatg gaataaagta 3000 aagaccaagg tttccagaga cttacacttgctcgaggaaa tcacccccag actccgagat 3060 tttgaggctg attcagaagt cattgagaagtgggtgagtg gcatcaaaga cttcctcatg 3120 aaagaacagg ctgcccaagg agacgctgctgcgcagagcc agcttgacca atgtgctacg 3180 tttgctaatg aaatcgaaac catcgagtcatctctgaaga acatgaggga agtagagact 3240 agccttcaga ggtgtccagt cactggagtcaagacatggg tacaggcaag actagtggat 3300 taccaatccc aactggagaa attcagcaaagagattgcta ttcaaaaaag caggctgtta 3360 gatagtcaag aaaaagccct gaacttgaaaaaggatttgg ctgagatgca ggagtggatg 3420 gcacaggctg aagaggacta cctggagagggacttcgagt acaaatctcc agaagaactc 3480 gagagtgcgg tggaggaaat gaagagggcaaaagaggatg tgctgcagaa ggaggtgagg 3540 gtgaaaattc tgaaggacag catcaagctggtggctgcca aggtgccctc tggtggccag 3600 gagttgacgt cggaattcaa cgaggtgctggagagctacc agcttctgtg caatagaatt 3660 cgagggaagt gccacacact ggaggaggtctggtcttgct gggtggagct gcttcactat 3720 ctggacctgg agaccacgtg gttgaacaccttggaggagc gcgtgaggag cacggaggcc 3780 ctgcctgaga gggcagaagc tgttcatgaagctctggagt ctcttgagtc tgttttgcgc 3840 catccagcgg ataatcgcac ccagattcgggaacttgggc agactctgat tgatggtgga 3900 atcctggatg acataatcag cgagaagctggaggctttta acagccgcta cgaagagctg 3960 agtcacttgg cggagagcaa acagatttctttggagaagc aactccaggt cctccgcgaa 4020 actgaccaca tgcttcaggt gctgaaggagagcctggggg agctggacaa acagcttacc 4080 acatacctga cggacaggat cgatgccttccaactgccac aggaagctca gaagatccaa 4140 gccgaaatct cagcccatga gctcaccctggaggagctga ggaagaatgt gcgctcccag 4200 cccccgacgt cccctgaggg cagggccaccagaggaggaa gtcagatgga catgctacag 4260 aggaaacttc gagaggtctc caccaaattccagcttttcc agaagcccgc aaatttcgag 4320 cagcggatgc tggactgcaa gcgtgtgttggagggagtga aggccgagct tcatgtcctc 4380 gatgtgaggg atgtggaccc tgatgtcattcaggcccact tggacaagtg catgaaacta 4440 tataaaacgt tgagtgaagt caaacttgaagttgagactg tcatcaaaac agggaggcac 4500 attgtccaga agcagcagac ggacaacccgaaaagcatgg acgaacagct tacatctctg 4560 aaagtcctct acaatgacct gggcgcacaggtgacagaag ggaagcaaga cctggaaaga 4620 gcctcacagc tgtccaggaa gatgaagaaggaggctgccg tcctctctga atggctctct 4680 gccacagagg cagaactagt gcagaaatccacatcagaag gcgtgattgg tgacctggac 4740 acagaaatct cctgggctaa aagtattctcaaggatctgg aaaagaggaa agttgactta 4800 aatggcatta cagagagcag tgctgcccttcagcacttgg tcttgggcag tgagtctgtt 4860 ctggaagaga acctctgtgt gctcaatgctggatggagcc gagtgcggac gtggaccgaa 4920 gactggtgca acaccttgct gaaccatcaaaaccagctgg agctatttga tggacacgtc 4980 gctcacatca gtacctggct ctatcaagcagaagctctgc tggatgagat cgaaaagaaa 5040 ccagcgagta aacaggaaga aattgtgaagcgtttactgt ctgaattgga tgatgccagc 5100 ctccaggttg agaatgttcg ggaacaagccatcatcttgg tgaatgctcg tggaagcgcc 5160 agcagggaac tcgtggaacc aaaattagccgagctgagca ggaactttga aaaggtgtcc 5220 cagcacataa agagcgcccg aatgctgattggtcaggacc cttcatccta ccaaggcttg 5280 gaccctgctg gaactgttca agctgctgagtctttctctg acttggaaaa cttagaacaa 5340 gacatagaaa acatgttgaa agttgtggaaaagcacttgg accccaataa cgatgagaag 5400 atggatgagg agcaagccca gattgaggaagttctacaaa gaggggagca tttgttacat 5460 gaacctatgg aggacagtaa gaaagaaaagatccgcttgc agttgttact tttgcatact 5520 cgttacaaca aaattaagac aatccctatccagcagagaa aaacaattcc agtttcttct 5580 ggaattacat catcagccct ccctgcagattatttggttg aaattaataa aattttactc 5640 actctggatg acattgaatt atcacttaatatgccggagc taaacaccac tgtctacaaa 5700 gacttctctt tccaggaaga ctctctgaagagtatcaaag gtcaactgga cagacttgga 5760 gagcagattg cagttgttca cgagaagcagccggatgtca tcgtggaagc ctctggccct 5820 gaggccattc agatcaggga catgctcgctcagctgaacg caaaatggga ccgagtgaat 5880 agagtgtaca gtgatcggag agggtcctttgccagggctg tggaggaatg gaggcagttc 5940 caccatgacc ttgatgacct tacacagtggctatctgaag ctgaagacct gctggtagac 6000 acttgtgctc cagatggtag cctggacctggagaaagcca gggcacagca gctggaactg 6060 gaagagggcc tcagcagcca ccagcccagcctgatcaagg ttaaccgaaa gggggaggac 6120 cttgttcaga gactccgccc ctcggaggcaagcttcctga aggagaagct ggcaggtttc 6180 aaccagcgct ggagcactct tgtagctgaggtggaggctt tgcagcccag gctaaaagga 6240 gaaagtcagc aggtgttggg gtataagagacggctagatg aggtcacctg ctggttaacg 6300 aaagtggaga gtgctgtgca gaagagatcaacccctgacc cggaagaaag cccacaggaa 6360 ttaacagatt tagcccaaga gacggaagttcaagctgaaa acattaagtg gctgaacaga 6420 gcagaactgg aaatgctttc agacaaaaatctgagtttgc gtgaaagaga gaaactttcg 6480 gaaagtttaa agaatgtaaa cacaacatggaccaaggtat gcagagaagt gcctagcctc 6540 ctgaagacac gcacccaaga cccctgctctgccccacaga tgaggatggc tgctcatccc 6600 aacgtccaaa aggtggtgct agtatcatctgcatcagatg ctcctctgcg tggcggcctg 6660 gaaatctcgg ttcctgctga tttggataaaaccatcacag aactggctga ctggctggta 6720 ttgatcgacc aaatgctgaa gtccaacattgtcactgtgg gggacgtgaa agagatcaat 6780 aagacagttt cccggatgaa aatcacaaaggctgatttag aacaacgcca tcctcagctt 6840 gattgtgtat ttacgttggc ccaaaatttgaaaaacaaag cttccagttc agatgtgaga 6900 acagcaatca cagaaaaatt ggaaaagctgaagacccagt gggagagtac tcagcatggt 6960 gtggagctgc ggcggcagca gctggaggacatggttgtgg acagcctgca gtgggacgac 7020 cacagggaag agactgaaga gctcatgagaaaatacgagg ctcgcttcta catgctgcag 7080 caggcccgcc gggacccact tagcaaacaagtttctgata atcaactatt gcttcaagag 7140 ctggggtctg gcgatggtgt catcatggcgtttgataatg tcctgcagaa acttctggaa 7200 gaatacagtg gcgatgacac aaggaatgtggaagaaacca cggagtactt gaaaacatca 7260 tgggtcaatc tcaaacaaag catcgctgatagacagagtg ccttggaggc tgagctacag 7320 acagtgcaga cttctcgtag agacctggagaactttgtca agtggcttca ggaagcagaa 7380 accacagcaa atgtgctggc cgatgcctctcagcgggaga atgctcttca ggacagtgtc 7440 ctggcccggc agctccgaca gcagatgctggacatccagg cagaaattga tgcccacaat 7500 gacatattta aaagcatcga tggaaaccggcagaagatgg tgaaagctct ggggaattct 7560 gaggaagcaa caatgcttca acatcgactggatgacatga accaaagatg gaatgatttg 7620 aaggcaaaat ctgctagcat cagggcccatttggaggcca gtgctgagaa atggaaccgg 7680 ttgctggcat cgctggaaga gctgatcaaatggctcaata tgaaagatga ggagcttaag 7740 aagcagatgc ccattggagg ggacgtccctgccttacagc tccagtatga ccactgcaag 7800 gtgctgagac gtgagctaaa ggagaaagagtattctgtgc tgaacgccgt agatcaagct 7860 cgagtttttc tggctgatca gccaatagaggcccccgaag aaccaagaag aaacccacaa 7920 tcaaagacag agttgactcc tgaggagagagcccagaaga tcgccaaagc catgcgcaag 7980 cagtcttctg aagtccgaga gaagtgggaaaatctaaatg ctgtcactag caactggcaa 8040 aagcaagtag ggaaggcgtt agagaaactccgagacctgc agggagctat ggacgacctg 8100 gacgcagaca tgaaggaggt ggaggctgtgcggaatggct ggaagcccgt gggagacctg 8160 cttatagact ccctgcagga tcacatcgagaaaaccctgg cgtttagaga agaaattgca 8220 ccaatcaact taaaagtaaa aacaatgaatgacctgtcca gtcagctgtc tccacttgac 8280 ttgcatccat ctctaaagat gtctcgccagctggatgacc ttaatatgcg atggaaactt 8340 ctacaggttt ccgtggacga tcgccttaagcagctccagg aagcccacag agattttggg 8400 ccatcttctc aacactttct gtccacttcagtccagctgc cgtggcagag atccatttca 8460 cataataaag tgccctatta catcaaccatcaaacacaga caacctgttg ggatcatcct 8520 aaaatgactg agctcttcca atcccttgctgatctgaata atgtacgttt ctctgcctac 8580 cgcacagcaa tcaaaattcg aaggctgcaaaaagcattat gtctggatct cttagagctg 8640 aatacgacga atgaagtttt caagcagcacaaactgaacc aaaatgatca gctcctgagt 8700 gtcccagacg tcatcaactg tctgaccaccacttacgatg ggcttgagca gctgcacaag 8760 gacttggtca atgttccact ctgcgtcgatatgtgtctca actggctgct caacgtatac 8820 gacacgggcc ggactggaaa aattcgggtacagagtctga agattggatt gatgtctctc 8880 tccaaaggcc tcttagaaga gaaatacagatgtctcttta aggaggtggc agggccaact 8940 gagatgtgtg accagcggca gcttggcctgctacttcacg atgccatcca gatccctagg 9000 cagctggggg aagtagcagc ctttgggggcagtaacattg agcccagtgt ccgcagctgc 9060 ttccagcaga ataacaacaa gccagaaatcagtgtgaagg agtttataga ctggatgcat 9120 ttggaacccc agtccatggt gtggttgccggttctgcatc gggtcgcagc tgctgagact 9180 gcaaaacatc aggccaaatg caacatctgcaaagaatgcc cgattgttgg gttcagatac 9240 aggagcctaa agcattttaa ttatgatgtctgccagagtt gcttcttttc tggaagaaca 9300 gcaaagggcc acaagttaca ttacccgatggtagaatact gcataccgac aacatctggg 9360 gaagatgtga gagatttcac taaggtgctgaagaacaagt tcaggtccaa gaaatatttt 9420 gccaaacatc ctcggcttgg ctacctgcctgtccagaccg tgctggaagg ggacaactta 9480 gaaactccta tcacgctcat cagtatgtggccagagcact atgacccctc ccagtcccct 9540 cagctgtttc atgatgacac ccactcaagaatagagcaat acgctacacg actggcccag 9600 atggaaagga caaacgggtc cttcctaactgatagcagct ctacaacagg aagcgtggag 9660 gatgagcatg ccctcatcca gcagtactgccagaccctgg gcggggagtc acctgtgagt 9720 cagccgcaga gtccagctca gatcctgaagtccgtggaga gggaagagcg tggggaactg 9780 gagcggatca ttgctgactt ggaggaagagcaaagaaatc tgcaggtgga gtatgagcag 9840 ctgaaggagc agcacctaag aaggggtctccctgtgggct cccctccaga ctccatcgta 9900 tctcctcacc acacatctga ggactcagaacttatagcag aagctaaact cctgcggcag 9960 cacaaagggc ggctggaggc gaggatgcaaattttggaag atcacaataa acagctggag 10020 tctcagctgc accgcctcag acagctcctggagcagcctg actctgactc ccgcatcaat 10080 ggtgtctccc cctgggcttc cccacagcattctgcattga gctactcact tgacactgac 10140 ccaggcccac agttccacca ggcagcatctgaggacctgc tggccccacc tcacgacact 10200 agcacggacc tcacggacgt gatggagcagatcaacagca cgtttccctc ttgcagctca 10260 aatgtcccca gcaggccaca ggcaatgtgagcatctatcc agccagccaa catttcccga 10320 ccttcagtat tgccctcttc tgcaaatgccaatcccaaga cccattcaac cccaaagctc 10380 cgtggctcca cgacacaagc tgttgagtgcttactgggtg ttctactgag ggaaccaaac 10440 actgactatc caaagatatt ttggttttctaataacgtat attattgttt tctttctccc 10500 ctttctatgc aactgtaaat taatgaacagagaagtattt ggaggtggta aagcatttgt 10560 cactgatttg tataatatat acagccatgggaaagtgggt gggggctttc taatatgaaa 10620 ctgtcttttt aataaccaag agaaaaaattgcataagaat tagaccactt tacattatta 10680 cattccttct gctgttcaca ttaaccttgtacaataactt cacttattat ttgactgttt 10740 taccattatg ttttggttat ttataaatttatcagccata ccaaacgaat agattctatg 10800 tatttggttt ctataatctg gccaaattcctaagttcata tatttgaatc aaatatttta 10860 catatgtgga gtaggcaggc attctgaagatactatttaa ctttagttga cgtcacacac 10920 accatccttt agtaaccact ggatgactacactaaaaatc ctgtggactt taacggcaag 10980 ctgctggggt atttttcctc ctgtttttattccttttttg taagtagatc ttgacgtctt 11040 tatttatttc atcttgcaat ctctataataaagaagactg tattgtaata gtcccc 11096 5 208 DNA Homo sapiens 5 gggattccctcactttcccc ctacaggact cagatctggg aggcaattac cttcggagaa 60 aaacgaataggaaaaactga agtgttactt tttttaaagc tgctgaagtt tgttggtttc 120 tcattgtttttaagcctact ggagcaataa agtttgaaga acttttacca ggtttttttt 180 atcgctgccttgatatacac ttttcaaa 208 6 756 DNA Homo sapiens 6 atgctttggt gggaagaagtagaggactgt tatgaaagag aagatgttca aaagaaaaca 60 ttcacaaaat gggtaaatgcacaattttct aagtttggga agcagcatat tgagaacctc 120 ttcagtgacc tacaggatgggaggcgcctc ctagacctcc tcgaaggcct gacagggcaa 180 aaactgccaa aagaaaaaggatccacaaga gttcatgccc tgaacaatgt caacaaggca 240 ctgcgggttt tgcagaacaataatgttgat ttagtgaata ttggaagtac tgacatcgta 300 gatggaaatc ataaactgactcttggtttg atttggaata taatcctcca ctggcaggtc 360 aaaaatgtaa tgaaaaatatcatggctgga ttgcaacaaa ccaacagtga aaagattctc 420 ctgagctggg tccgacaatcaactcgtaat tatccacagg ttaatgtaat caacttcacc 480 accagctggt ctgatggcctggctttgaat gctctcatcc atagtcatag gccagaccta 540 tttgactgga atagtgtggtttgccagcag tcagccacac aacgactgga acatgcattc 600 aacatcgcca gatatcaattaggcatagag aaactactcg atcctgaaga tgttgatacc 660 acctatccag ataagaagtccatcttaatg tacatcacat cactcttcca agttttgcct 720 caacaagtga gcattgaagccatccaggaa gtggaa 756 7 255 DNA Homo sapiens 7 atgttgccaa ggccacctaaagtgactaaa gaagaacatt ttcagttaca tcatcaaatg 60 cactattctc aacagatcacggtcagtcta gcacagggat atgagagaac ttcttcccct 120 aagcctcgat tcaagagctatgcctacaca caggctgctt atgtcaccac ctctgaccct 180 acacggagcc catttccttcacagcatttg gaagctcctg aagacaagtc atttggcagt 240 tcattgatgg agagt 255 8327 DNA Homo sapiens 8 gaagtaaacc tggaccgtta tcaaacagct ttagaagaagtattatcgtg gcttctttct 60 gctgaggaca cattgcaagc acaaggagag atttctaatgatgtggaagt ggtgaaagac 120 cagtttcata ctcatgaggg gtacatgatg gatttgacagcccatcaggg ccgggttggt 180 aatattctac aattgggaag taagctgatt ggaacaggaaaattatcaga agatgaagaa 240 actgaagtac aagagcagat gaatctccta aattcaagatgggaatgcct cagggtagct 300 agcatggaaa aacaaagcaa tttacat 327 9 333 DNAHomo sapiens 9 agagttttaa tggatctcca gaatcagaaa ctgaaagagt tgaatgactggctaacaaaa 60 acagaagaaa gaacaaggaa aatggaggaa gagcctcttg gacctgatcttgaagaccta 120 aaacgccaag tacaacaaca taaggtgctt caagaagatc tagaacaagaacaagtcagg 180 gtcaattctc tcactcacat ggtggtggta gttgatgaat ctagtggagatcacgcaact 240 gctgctttgg aagaacaact taaggtattg ggagatcgat gggcaaacatctgtagatgg 300 acagaagacc gctgggttct tttacaagac atc 333 10 333 DNA Homosapiens 10 cttctcaaat ggcaacgtct tactgaagaa cagtgccttt ttagtgcatggctttcagaa 60 aaagaagatg cagtgaacaa gattcacaca actggcttta aagatcaaaatgaaatgtta 120 tcaagtcttc aaaaactggc cgttttaaaa gcggatctag aaaagaaaaagcaatccatg 180 ggcaaactgt attcactcaa acaagatctt ctttcaacac tgaagaataagtcagtgacc 240 cagaagacgg aagcatggct ggataacttt gcccggtgtt gggataatttagtccaaaaa 300 cttgaaaaga gtacagcaca gatttcacag gct 333 11 147 DNA Homosapiens 11 gtcaccacca ctcagccatc actaacacag acaactgtaa tggaaacagtaactacggtg 60 accacaaggg aacagatcct ggtaaagcat gctcaagagg aacttccaccaccacctccc 120 caaaagaaga ggcagattac tgtggat 147 12 333 DNA Homo sapiens12 tctgaaatta ggaaaaggtt ggatgttgat ataactgaac ttcacagctg gattactcgc 60tcagaagctg tgttgcagag tcctgaattt gcaatctttc ggaaggaagg caacttctca 120gacttaaaag aaaaagtcaa tgccatagag cgagaaaaag ctgagaagtt cagaaaactg 180caagatgcca gcagatcagc tcaggccctg gtggaacaga tggtgaatga gggtgttaat 240gcagatagca tcaaacaagc ctcagaacaa ctgaacagcc ggtggatcga attctgccag 300ttgctaagtg agagacttaa ctggctggag tat 333 13 327 DNA Homo sapiens 13cagaacaaca tcatcgcttt ctataatcag ctacaacaat tggagcagat gacaactact 60gctgaaaact ggttgaaaat ccaacccacc accccatcag agccaacagc aattaaaagt 120cagttaaaaa tttgtaagga tgaagtcaac cggctatcag gtcttcaacc tcaaattgaa 180cgattaaaaa ttcaaagcat agccctgaaa gagaaaggac aaggacccat gttcctggat 240gcagactttg tggcctttac aaatcatttt aagcaagtct tttctgatgt gcaggccaga 300gagaaagagc tacagacaat ttttgac 327 14 327 DNA Homo sapiens 14 actttgccaccaatgcgcta tcaggagacc atgagtgcca tcaggacatg ggtccagcag 60 tcagaaaccaaactctccat acctcaactt agtgtcaccg actatgaaat catggagcag 120 agactcggggaattgcaggc tttacaaagt tctctgcaag agcaacaaag tggcctatac 180 tatctcagcaccactgtgaa agagatgtcg aagaaagcgc cctctgaaat tagccggaaa 240 tatcaatcagaatttgaaga aattgaggga cgctggaaga agctctcctc ccagctggtt 300 gagcattgtcaaaagctaga ggagcaa 327 15 327 DNA Homo sapiens 15 atgaataaac tccgaaaaattcagaatcac atacaaaccc tgaagaaatg gatggctgaa 60 gttgatgttt ttctgaaggaggaatggcct gcccttgggg attcagaaat tctaaaaaag 120 cagctgaaac agtgcagacttttagtcagt gatattcaga caattcagcc cagtctaaac 180 agtgtcaatg aaggtgggcagaagataaag aatgaagcag agccagagtt tgcttcgaga 240 cttgagacag aactcaaagaacttaacact cagtgggatc acatgtgcca acaggtctat 300 gccagaaagg aggccttgaagggaggt 327 16 327 DNA Homo sapiens 16 ttggagaaaa ctgtaagcct ccagaaagatctatcagaga tgcacgaatg gatgacacaa 60 gctgaagaag agtatcttga gagagattttgaatataaaa ctccagatga attacagaaa 120 gcagttgaag agatgaagag agctaaagaagaggcccaac aaaaagaagc gaaagtgaaa 180 ctccttactg agtctgtaaa tagtgtcatagctcaagctc cacctgtagc acaagaggcc 240 ttaaaaaagg aacttgaaac tctaaccaccaactaccagt ggctctgcac taggctgaat 300 gggaaatgca agactttgga agaagtt 32717 312 DNA Homo sapiens 17 tgggcatgtt ggcatgagtt attgtcatac ttggagaaagcaaacaagtg gctaaatgaa 60 gtagaattta aacttaaaac cactgaaaac attcctggcggagctgagga aatctctgag 120 gtgctagatt cacttgaaaa tttgatgcga cattcagaggataacccaaa tcagattcgc 180 atattggcac agaccctaac agatggcgga gtcatggatgagctaatcaa tgaggaactt 240 gagacattta attctcgttg gagggaacta catgaagaggctgtaaggag gcaaaagttg 300 cttgaacaga gc 312 18 276 DNA Homo sapiens 18atccagtctg cccaggagac tgaaaaatcc ttacacttaa tccaggagtc cctcacattc 60attgacaagc agttggcagc ttatattgca gacaaggtgg acgcagctca aatgcctcag 120gaagcccaga aaatccaatc tgatttgaca agtcatgaga tcagtttaga agaaatgaag 180aaacataatc aggggaagga ggctgcccaa agagtcctgt ctcagattga tgttgcacag 240aaaaaattac aagatgtctc catgaagttt cgatta 276 19 327 DNA Homo sapiens 19ttccagaaac cagccaattt tgagctgcgt ctacaagaaa gtaagatgat tttagatgaa 60gtgaagatgc acttgcctgc attggaaaca aagagtgtgg aacaggaagt agtacagtca 120cagctaaatc attgtgtgaa cttgtataaa agtctgagtg aagtgaagtc tgaagtggaa 180atggtgataa agactggacg tcagattgta cagaaaaagc agacggaaaa tcccaaagaa 240cttgatgaaa gagtaacagc tttgaaattg cattataatg agctgggagc aaaggtaaca 300gaaagaaagc aacagttgga gaaatgc 327 20 324 DNA Homo sapiens 20 ttgaaattgtcccgtaagat gcgaaaggaa atgaatgtct tgacagaatg gctggcagct 60 acagatatggaattgacaaa gagatcagca gttgaaggaa tgcctagtaa tttggattct 120 gaagttgcctggggaaaggc tactcaaaaa gagattgaga aacagaaggt gcacctgaag 180 agtatcacagaggtaggaga ggccttgaaa acagttttgg gcaagaagga gacgttggtg 240 gaagataaactcagtcttct gaatagtaac tggatagctg tcacctcccg agcagaagag 300 tggttaaatcttttgttgga atac 324 21 312 DNA Homo sapiens 21 cagaaacaca tggaaacttttgaccagaat gtggaccaca tcacaaagtg gatcattcag 60 gctgacacac ttttggatgaatcagagaaa aagaaacccc agcaaaaaga agacgtgctt 120 aagcgtttaa aggcagaactgaatgacata cgcccaaagg tggactctac acgtgaccaa 180 gcagcaaact tgatggcaaaccgcggtgac cactgcagga aattagtaga gccccaaatc 240 tcagagctca accatcgatttgcagccatt tcacacagaa ttaagactgg aaaggcctcc 300 attcctttga ag 312 22 282DNA Homo sapiens 22 gaattggagc agtttaactc agatatacaa aaattgcttgaaccactgga ggctgaaatt 60 cagcaggggg tgaatctgaa agaggaagac ttcaataaagatatgaatga agacaatgag 120 ggtactgtaa aagaattgtt gcaaagagga gacaacttacaacaaagaat cacagatgag 180 agaaagagag aggaaataaa gataaaacag cagctgttacagacaaaaca taatgctctc 240 aaggatttga ggtctcaaag aagaaaaaag gctctagaaa tt282 23 294 DNA Homo sapiens 23 tctcatcagt ggtatcagta caagaggcaggctgatgatc tcctgaaatg cttggatgac 60 attgaaaaaa aattagccag cctacctgagcccagagatg aaaggaaaat aaaggaaatt 120 gatcgggaat tgcagaagaa gaaagaggagctgaatgcag tgcgtaggca agctgagggc 180 ttgtctgagg atggggccgc aatggcagtggagccaactc agatccagct cagcaagcgc 240 tggcgggaaa ttgagagcaa atttgctcagtttcgaagac tcaactttgc acaa 294 24 327 DNA Homo sapiens 24 tcttatgtgccttctactta tttgactgaa atcactcatg tctcacaagc cctattagaa 60 gtggaacaacttctcaatgc tcctgacctc tgtgctaagg actttgaaga tctctttaag 120 caagaggagtctctgaagaa tataaaagat agtctacaac aaagctcagg tcggattgac 180 attattcatagcaagaagac agcagcattg caaagtgcaa cgcctgtgga aagggtgaag 240 ctacaggaagctctctccca gcttgatttc caatgggaaa aagttaacaa aatgtacaag 300 gaccgacaagggcgatttga cagatct 327 25 321 DNA Homo sapiens 25 gttgagaaat ggcggcgttttcattatgat ataaagatat ttaatcagtg gctaacagaa 60 gctgaacagt ttctcagaaagacacaaatt cctgagaatt gggaacatgc taaatacaaa 120 tggtatctta aggaactccaggatggcatt gggcagcggc aaactgttgt cagaacattg 180 aatgcaactg gggaagaaataattcagcaa tcctcaaaaa cagatgccag tattctacag 240 gaaaaattgg gaagcctgaatctgcggtgg caggaggtct gcaaacagct gtcagacaga 300 aaaaagaggc tagaagaaca a321 26 351 DNA Homo sapiens 26 aagaatatct tgtcagaatt tcaaagagatttaaatgaat ttgttttatg gttggaggaa 60 gcagataaca ttgctagtat cccacttgaacctggaaaag agcagcaact aaaagaaaag 120 cttgagcaag tcaagttact ggtggaagagttgcccctgc gccagggaat tctcaaacaa 180 ttaaatgaaa ctggaggacc cgtgcttgtaagtgctccca taagcccaga agagcaagat 240 aaacttgaaa ataagctcaa gcagacaaatctccagtgga taaaggtttc cagagcttta 300 cctgagaaac aaggagaaat tgaagctcaaataaaagacc ttgggcagct t 351 27 303 DNA Homo sapiens 27 gaaaaaaagcttgaagacct tgaagagcag ttaaatcatc tgctgctgtg gttatctcct 60 attaggaatcagttggaaat ttataaccaa ccaaaccaag aaggaccatt tgacgttcag 120 gaaactgaaatagcagttca agctaaacaa ccggatgtgg aagagatttt gtctaaaggg 180 cagcatttgtacaaggaaaa accagccact cagccagtga agaggaagtt agaagatctg 240 agctctgagtggaaggcggt aaaccgttta cttcaagagc tgagggcaaa gcagcctgac 300 cta 303 28123 DNA Homo sapiens 28 gctcctggac tgaccactat tggagcctct cctactcagactgttactct ggtgacacaa 60 cctgtggtta ctaaggaaac tgccatctcc aaactagaaatgccatcttc cttgatgttg 120 gag 123 29 330 DNA Homo sapiens 29 gtacctgctctggcagattt caaccgggct tggacagaac ttaccgactg gctttctctg 60 cttgatcaagttataaaatc acagagggtg atggtgggtg accttgagga tatcaacgag 120 atgatcatcaagcagaaggc aacaatgcag gatttggaac agaggcgtcc ccagttggaa 180 gaactcattaccgctgccca aaatttgaaa aacaagacca gcaatcaaga ggctagaaca 240 atcattacggatcgaattga aagaattcag aatcagtggg atgaagtaca agaacacctt 300 cagaaccggaggcaacagtt gaatgaaatg 330 30 327 DNA Homo sapiens 30 ttaaaggattcaacacaatg gctggaagct aaggaagaag ctgagcaggt cttaggacag 60 gccagagccaagcttgagtc atggaaggag ggtccctata cagtagatgc aatccaaaag 120 aaaatcacagaaaccaagca gttggccaaa gacctccgcc agtggcagac aaatgtagat 180 gtggcaaatgacttggccct gaaacttctc cgggattatt ctgcagatga taccagaaaa 240 gtccacatgataacagagaa tatcaatgcc tcttggagaa gcattcataa aagggtgagt 300 gagcgagaggctgctttgga agaaact 327 31 348 DNA Homo sapiens 31 catagattac tgcaacagttccccctggac ctggaaaagt ttcttgcctg gcttacagaa 60 gctgaaacaa ctgccaatgtcctacaggat gctacccgta aggaaaggct cctagaagac 120 tccaagggag taaaagagctgatgaaacaa tggcaagacc tccaaggtga aattgaagct 180 cacacagatg tttatcacaacctggatgaa aacagccaaa aaatcctgag atccctggaa 240 ggttccgatg atgcagtcctgttacaaaga cgtttggata acatgaactt caagtggagt 300 gaacttcgga aaaagtctctcaacattagg tcccatttgg aagccagt 348 32 387 DNA Homo sapiens 32 tctgaccagtggaagcgtct gcacctttct ctgcaggaac ttctggtgtg gctacagctg 60 aaagatgatgaattaagccg gcaggcacct attggaggcg actttccagc agttcagaag 120 cagaacgatgtacatagggc cttcaagagg gaattgaaaa ctaaagaacc tgtaatcatg 180 agtactcttgagactgtacg aatatttctg acagagcagc ctttggaagg actagagaaa 240 ctctaccaggagcccagaga gctgcctcct gaggagagag cccagaatgt cactcggctt 300 ctacgaaagcaggctgagga ggtcaatact gagtgggaaa aattgaacct gcactccgct 360 gactggcagagaaaaataga tgagacc 387 33 324 DNA Homo sapiens 33 cttgaaagac tccaggaacttcaagaggcc acggatgagc tggacctcaa gctgcgccaa 60 gctgaggtga tcaagggatcctggcagccc gtgggcgatc tcctcattga ctctctccaa 120 gatcacctcg agaaagtcaaggcacttcga ggagaaattg cgcctctgaa agagaacgtg 180 agccacgtca atgaccttgctcgccagctt accactttgg gcattcagct ctcaccgtat 240 aacctcagca ctctggaagacctgaacacc agatggaagc ttctgcaggt ggccgtcgag 300 gaccgagtca ggcagctgcatgaa 324 34 216 DNA Homo sapiens 34 gcccacaggg actttggtcc agcatctcagcactttcttt ccacgtctgt ccagggtccc 60 tgggagagag ccatctcgcc aaacaaagtgccctactata tcaaccacga gactcaaaca 120 acttgctggg accatcccaa aatgacagagctctaccagt ctttagctga cctgaataat 180 gtcagattct cagcttatag gactgccatgaaactc 216 35 887 DNA Homo sapiens 35 cgaagactgc agaaggccct ttgcttggatctcttgagcc tgtcagctgc atgtgatgcc 60 ttggaccagc acaacctcaa gcaaaatgaccagcccatgg atatcctgca gattattaat 120 tgtttgacca ctatttatga ccgcctggagcaagagcaca acaatttggt caacgtccct 180 ctctgcgtgg atatgtgtct gaactggctgctgaatgttt atgatacggg acgaacaggg 240 aggatccgtg tcctgtcttt taaaactggcatcatttccc tgtgtaaagc acatttggaa 300 gacaagtaca gatacctttt caagcaagtggcaagttcaa caggattttg tgaccagcgc 360 aggctgggcc tccttctgca tgattctatccaaattccaa gacagttggg tgaagttgca 420 tcctttgggg gcagtaacat tgagccaagtgtccggagct gcttccaatt tgctaataat 480 aagccagaga tcgaagcggc cctcttcctagactggatga gactggaacc ccagtccatg 540 gtgtggctgc ccgtcctgca cagagtggctgctgcagaaa ctgccaagca tcaggccaaa 600 tgtaacatct gcaaagagtg tccaatcattggattcaggt acaggagtct aaagcacttt 660 aattatgaca tctgccaaag ctgctttttttctggtcgag ttgcaaaagg ccataaaatg 720 cactatccca tggtggaata ttgcactccgactacatcag gagaagatgt tcgagacttt 780 gccaaggtac taaaaaacaa atttcgaaccaaaaggtatt ttgcgaagca tccccgaatg 840 ggctacctgc cagtgcagac tgtcttagagggggacaaca tggaaac 887 36 823 DNA Homo sapiens 36 tcccgttact ctgatcaacttctggccagt agattctgcg cctgcctcgt cccctcagct 60 ttcacacgat gatactcattcacgcattga acattatgct agcaggctag cagaaatgga 120 aaacagcaat ggatcttatctaaatgatag catctctcct aatgagagca tagatgatga 180 acatttgtta atccagcattactgccaaag tttgaaccag gactcccccc tgagccagcc 240 tcgtagtcct gcccagatcttgatttcctt agagagtgag gaaagagggg agctagagag 300 aatcctagca gatcttgaggaagaaaacag gaatctgcaa gcagaatatg accgtctaaa 360 gcagcagcac gaacataaaggcctgtcccc actgccgtcc cctcctgaaa tgatgcccac 420 ctctccccag agtccccgggatgctgagct cattgctgag gccaagctac tgcgtcaaca 480 caaaggccgc ctggaagccaggatgcaaat cctggaagac cacaataaac agctggagtc 540 acagttacac aggctaaggcagctgctgga gcaaccccag gcagaggcca aagtgaatgg 600 cacaacggtg tcctctccttctacctctct acagaggtcc gacagcagtc agcctatgct 660 gctccgagtg gttggcagtcaaacttcgga ctccatgggt gaggaagatc ttctcagtcc 720 tccccaggac acaagcacagggttagagga ggtgatggag caactcaaca actccttccc 780 tagttcaaga ggaagaaatacccctggaaa gccaatgaga gag 823 37 12 DNA Homo sapiens 37 gacacaatgt ag 1238 2691 DNA Homo sapiens 38 gaagtctttt ccacatggca gatgatttgg gcagagcgatggagtcctta gtatcagtca 60 tgacagatga agaaggagca gaataaatgt tttacaactcctgattcccg catggttttt 120 ataatattca tacaacaaag aggattagac agtaagagtttacaagaaat aaatctatat 180 ttttgtgaag ggtagtggta ttatactgta gatttcagtagtttctaagt ctgttattgt 240 tttgttaaca atggcaggtt ttacacgtct atgcaattgtacaaaaaagt tataagaaaa 300 ctacatgtaa aatcttgata gctaaataac ttgccatttctttatatgga acgcattttg 360 ggttgtttaa aaatttataa cagttataaa gaaagattgtaaactaaagt gtgctttata 420 aaaaaaagtt gtttataaaa acccctaaaa acaaaacaaacacacacaca cacacataca 480 cacacacaca caaaactttg aggcagcgca ttgttttgcatccttttggc gtgatatcca 540 tatgaaattc atggcttttt ctttttttgc atattaaagataagacttcc tctaccacca 600 caccaaatga ctactacaca ctgctcattt gagaactgtcagctgagtgg ggcaggcttg 660 agttttcatt tcatatatct atatgtctat aagtatataaatactatagt tatatagata 720 aagagatacg aatttctata gactgacttt ttccattttttaaatgttca tgtcacatcc 780 taatagaaag aaattacttc tagtcagtca tccaggcttacctgcttggt ctagaatgga 840 tttttcccgg agccggaagc caggaggaaa ctacaccacactaaaacatt gtctacagct 900 ccagatgttt ctcattttaa acaactttcc actgacaacgaaagtaaagt aaagtattgg 960 atttttttaa agggaacatg tgaatgaata cacaggacttattatatcag agtgagtaat 1020 cggttggttg gttgattgat tgattgattg atacattcagcttcctgctg ctagcaatgc 1080 cacgatttag atttaatgat gcttcagtgg aaatcaatcagaaggtattc tgaccttgtg 1140 aacatcagaa ggtatttttt aactcccaag cagtagcaggacgatgatag ggctggaggg 1200 ctatggattc ccagcccatc cctgtgaagg agtaggccactctttaagtg aaggattgga 1260 tgattgttca taatacataa agttctctgt aattacaactaaattattat gccctcttct 1320 cacagtcaaa aggaactggg tggtttggtt tttgttgcttttttagattt attgtcccat 1380 gtgggatgag tttttaaatg ccacaagaca taatttaaaataaataaact ttgggaaaag 1440 gtgtaagaca gtagccccat cacatttgtg atactgacaggtatcaaccc agaagcccat 1500 gaactgtgtt tccatccttt gcatttctct gcgagtagttccacacaggt ttgtaagtaa 1560 gtaagaaaga aggcaaattg attcaaatgt tacaaaaaaacccttcttgg tggattagac 1620 aggttaaata tataaacaaa caaacaaaaa ttgctcaaaaaagaggagaa aagctcaaga 1680 ggaaaagcta aggactggta ggaaaaagct ttactctttcatgccatttt atttcttttt 1740 gatttttaaa tcattcattc aatagatacc accgtgtgacctataatttt gcaaatctgt 1800 tacctctgac atcaagtgta attagctttt ggagagtgggctgacatcaa gtgtaattag 1860 cttttggaga gtgggttttg tccattatta ataattaattaattaacatc aaacacggct 1920 tctcatgcta tttctacctc actttggttt tggggtgttcctgataattg tgcacacctg 1980 agttcacagc ttcaccactt gtccattgcg ttattttctttttcctttat aattctttct 2040 ttttccttca taattttcaa aagaaaaccc aaagctctaaggtaacaaat taccaaatta 2100 catgaagatt tggtttttgt cttgcatttt tttcctttatgtgacgctgg accttttctt 2160 tacccaagga tttttaaaac tcagatttaa aacaaggggttactttacat cctactaaga 2220 agtttaagta agtaagtttc attctaaaat cagaggtaaatagagtgcat aaataatttt 2280 gttttaatct ttttgttttt cttttagaca cattagctctggagtgagtc tgtcataata 2340 tttgaacaaa aattgagagc tttattgctg cattttaagcataattaatt tggacattat 2400 ttcgtgttgt gttctttata accaccgagt attaaactgtaaatcataat gtaactgaag 2460 cataaacatc acatggcatg ttttgtcatt gttttcaggtactgagttct tacttgagta 2520 tcataatata ttgtgtttta acaccaacac tgtaacatttacgaattatt tttttaaact 2580 tcagttttac tgcattttca caacatatca gacttcaccaaatatatgcc ttactattgt 2640 attatagtac tgctttactg tgtatctcaa taaagcacgcagttatgtta c 2691 39 5417 DNA Artificial Sequence Synthetic 39gggattccct cactttcccc ctacaggact cagatctggg aggcaattac cttcggagaa 60aaacgaatag gaaaaactga agtgttactt tttttaaagc tgctgaagtt tgttggtttc 120tcattgtttt taagcctact ggagcaataa agtttgaaga acttttacca ggtttttttt 180atcgctgcct tgatatacac ttttcaaaat gctttggtgg gaagaagtag aggactgtta 240tgaaagagaa gatgttcaaa agaaaacatt cacaaaatgg gtaaatgcac aattttctaa 300gtttgggaag cagcatattg agaacctctt cagtgaccta caggatggga ggcgcctcct 360agacctcctc gaaggcctga cagggcaaaa actgccaaaa gaaaaaggat ccacaagagt 420tcatgccctg aacaatgtca acaaggcact gcgggttttg cagaacaata atgttgattt 480agtgaatatt ggaagtactg acatcgtaga tggaaatcat aaactgactc ttggtttgat 540ttggaatata atcctccact ggcaggtcaa aaatgtaatg aaaaatatca tggctggatt 600gcaacaaacc aacagtgaaa agattctcct gagctgggtc cgacaatcaa ctcgtaatta 660tccacaggtt aatgtaatca acttcaccac cagctggtct gatggcctgg ctttgaatgc 720tctcatccat agtcataggc cagacctatt tgactggaat agtgtggttt gccagcagtc 780agccacacaa cgactggaac atgcattcaa catcgccaga tatcaattag gcatagagaa 840actactcgat cctgaagatg ttgataccac ctatccagat aagaagtcca tcttaatgta 900catcacatca ctcttccaag ttttgcctca acaagtgagc attgaagcca tccaggaagt 960ggaaatgttg ccaaggccac ctaaagtgac taaagaagaa cattttcagt tacatcatca 1020aatgcactat tctcaacaga tcacggtcag tctagcacag ggatatgaga gaacttcttc 1080ccctaagcct cgattcaaga gctatgccta cacacaggct gcttatgtca ccacctctga 1140ccctacacgg agcccatttc cttcacagca tttggaagct cctgaagaca agtcatttgg 1200cagttcattg atggagagtg aagtaaacct ggaccgttat caaacagctt tagaagaagt 1260attatcgtgg cttctttctg ctgaggacac attgcaagca caaggagaga tttctaatga 1320tgtggaagtg gtgaaagacc agtttcatac tcatgagggg tacatgatgg atttgacagc 1380ccatcagggc cgggttggta atattctaca attgggaagt aagctgattg gaacaggaaa 1440attatcagaa gatgaagaaa ctgaagtaca agagcagatg aatctcctaa attcaagatg 1500ggaatgcctc agggtagcta gcatggaaaa acaaagcaat ttacatagag ttttaatgga 1560tctccagaat cagaaactga aagagttgaa tgactggcta acaaaaacag aagaaagaac 1620aaggaaaatg gaggaagagc ctcttggacc tgatcttgaa gacctaaaac gccaagtaca 1680acaacataag gtgcttcaag aagatctaga acaagaacaa gtcagggtca attctctcac 1740tcacatggtg gtggtagttg atgaatctag tggagatcac gcaactgctg ctttggaaga 1800acaacttaag gtattgggag atcgatgggc aaacatctgt agatggacag aagaccgctg 1860ggttctttta caagacatcc ttctcaaatg gcaacgtctt actgaagaac agtgcctttt 1920tagtgcatgg ctttcagaaa aagaagatgc agtgaacaag attcacacaa ctggctttaa 1980agatcaaaat gaaatgttat caagtcttca aaaactggcc gttttaaaag cggatctaga 2040aaagaaaaag caatccatgg gcaaactgta ttcactcaaa caagatcttc tttcaacact 2100gaagaataag tcagtgaccc agaagacgga agcatggctg gataactttg cccggtgttg 2160ggataattta gtccaaaaac ttgaaaagag tacagcacag atttcacagg ctgtcaccac 2220cactcagcca tcactaacac agacaactgt aatggaaaca gtaactacgg tgaccacaag 2280ggaacagatc ctggtaaagc atgctcaaga ggaacttcca ccaccacctc cccaaaagaa 2340gaggcagatt actgtggatc ttgaaagact ccaggaactt caagaggcca cggatgagct 2400ggacctcaag ctgcgccaag ctgaggtgat caagggatcc tggcagcccg tgggcgatct 2460cctcattgac tctctccaag atcacctcga gaaagtcaag gcacttcgag gagaaattgc 2520gcctctgaaa gagaacgtga gccacgtcaa tgaccttgct cgccagctta ccactttggg 2580cattcagctc tcaccgtata acctcagcac tctggaagac ctgaacacca gatggaagct 2640tctgcaggtg gccgtcgagg accgagtcag gcagctgcat gaagcccaca gggactttgg 2700tccagcatct cagcactttc tttccacgtc tgtccagggt ccctgggaga gagccatctc 2760gccaaacaaa gtgccctact atatcaacca cgagactcaa acaacttgct gggaccatcc 2820caaaatgaca gagctctacc agtctttagc tgacctgaat aatgtcagat tctcagctta 2880taggactgcc atgaaactcc gaagactgca gaaggccctt tgcttggatc tcttgagcct 2940gtcagctgca tgtgatgcct tggaccagca caacctcaag caaaatgacc agcccatgga 3000tatcctgcag attattaatt gtttgaccac tatttatgac cgcctggagc aagagcacaa 3060caatttggtc aacgtccctc tctgcgtgga tatgtgtctg aactggctgc tgaatgttta 3120tgatacggga cgaacaggga ggatccgtgt cctgtctttt aaaactggca tcatttccct 3180gtgtaaagca catttggaag acaagtacag ataccttttc aagcaagtgg caagttcaac 3240aggattttgt gaccagcgca ggctgggcct ccttctgcat gattctatcc aaattccaag 3300acagttgggt gaagttgcat cctttggggg cagtaacatt gagccaagtg tccggagctg 3360cttccaattt gctaataata agccagagat cgaagcggcc ctcttcctag actggatgag 3420actggaaccc cagtccatgg tgtggctgcc cgtcctgcac agagtggctg ctgcagaaac 3480tgccaagcat caggccaaat gtaacatctg caaagagtgt ccaatcattg gattcaggta 3540caggagtcta aagcacttta attatgacat ctgccaaagc tgcttttttt ctggtcgagt 3600tgcaaaaggc cataaaatgc actatcccat ggtggaatat tgcactccga ctacatcagg 3660agaagatgtt cgagactttg ccaaggtact aaaaaacaaa tttcgaacca aaaggtattt 3720tgcgaagcat ccccgaatgg gctacctgcc agtgcagact gtcttagagg gggacaacat 3780ggaaacgcct gcctcgtccc ctcagctttc acacgatgat actcattcac gcattgaaca 3840ttatgctagc aggctagcag aaatggaaaa cagcaatgga tcttatctaa atgatagcat 3900ctctcctaat gagagcatag atgatgaaca tttgttaatc cagcattact gccaaagttt 3960gaaccaggac tcccccctga gccagcctcg tagtcctgcc cagatcttga tttccttaga 4020gagtgaggaa agaggggagc tagagagaat cctagcagat cttgaggaag aaaacaggaa 4080tctgcaagca gaatatgacc gtctaaagca gcagcacgaa cataaaggcc tgtccccact 4140gccgtcccct cctgaaatga tgcccacctc tccccagagt ccccgggatg ctgagctcat 4200tgctgaggcc aagctactgc gtcaacacaa aggccgcctg gaagccagga tgcaaatcct 4260ggaagaccac aataaacagc tggagtcaca gttacacagg ctaaggcagc tgctggagca 4320accccaggca gaggccaaag tgaatggcac aacggtgtcc tctccttcta cctctctaca 4380gaggtccgac agcagtcagc ctatgctgct ccgagtggtt ggcagtcaaa cttcggactc 4440catgggtgag gaagatcttc tcagtcctcc ccaggacaca agcacagggt tagaggaggt 4500gatggagcaa ctcaacaact ccttccctag ttcaagagga agaaataccc ctggaaagcc 4560aatgagagag gacacaatgt aggaagtctt ttccacatgg cagatgattt gggcagagcg 4620atggagtcct tagtatcagt catgacagat gaagaaggag cagaataaat gttttacaac 4680tcctgattcc cgcatggttt ttataatatt catacaacaa agaggattag acagtaagag 4740tttacaagaa ataaatctat atttttgtga agggtagtgg tattatactg tagatttcag 4800tagtttctaa gtctgttatt gttttgttaa caatggcagg ttttacacgt ctatgcaatt 4860gtacaaaaaa gttataagaa aactacatgt aaaatcttga tagctaaata acttgccatt 4920tctttatatg gaacgcattt tgggttgttt aaaaatttat aacagttata aagaaagatt 4980gtaaactaaa gtgtgcttta taaaaaaaag ttgtttataa aaacccctaa aaacaaaaca 5040aacacacaca cacacacata cacacacaca cacaaaactt tgaggcagcg cattgttttg 5100catccttttg gcgtgatatc catatgaaat tcatggcttt ttcttttttt gcatattaaa 5160gataagactt cctctaccac cacaccaaat gactactaca cactgctcat ttgagaactg 5220tcagctgagt ggggcaggct tgagttttca tttcatatat ctatatgtct ataagtatat 5280aaatactata gttatataga taaagagata cgaatttcta tagactgact ttttccattt 5340tttaaatgtt catgtcacat cctaatagaa agaaattact tctagtcagt catccaggct 5400tacctgcttg gtctaga 5417 40 5339 DNA Artificial Sequence Synthetic 40gggattccct cactttcccc ctacaggact cagatctggg aggcaattac cttcggagaa 60aaacgaatag gaaaaactga agtgttactt tttttaaagc tgctgaagtt tgttggtttc 120tcattgtttt taagcctact ggagcaataa agtttgaaga acttttacca ggtttttttt 180atcgctgcct tgatatacac ttttcaaaat gctttggtgg gaagaagtag aggactgtta 240tgaaagagaa gatgttcaaa agaaaacatt cacaaaatgg gtaaatgcac aattttctaa 300gtttgggaag cagcatattg agaacctctt cagtgaccta caggatggga ggcgcctcct 360agacctcctc gaaggcctga cagggcaaaa actgccaaaa gaaaaaggat ccacaagagt 420tcatgccctg aacaatgtca acaaggcact gcgggttttg cagaacaata atgttgattt 480agtgaatatt ggaagtactg acatcgtaga tggaaatcat aaactgactc ttggtttgat 540ttggaatata atcctccact ggcaggtcaa aaatgtaatg aaaaatatca tggctggatt 600gcaacaaacc aacagtgaaa agattctcct gagctgggtc cgacaatcaa ctcgtaatta 660tccacaggtt aatgtaatca acttcaccac cagctggtct gatggcctgg ctttgaatgc 720tctcatccat agtcataggc cagacctatt tgactggaat agtgtggttt gccagcagtc 780agccacacaa cgactggaac atgcattcaa catcgccaga tatcaattag gcatagagaa 840actactcgat cctgaagatg ttgataccac ctatccagat aagaagtcca tcttaatgta 900catcacatca ctcttccaag ttttgcctca acaagtgagc attgaagcca tccaggaagt 960ggaaatgttg ccaaggccac ctaaagtgac taaagaagaa cattttcagt tacatcatca 1020aatgcactat tctcaacaga tcacggtcag tctagcacag ggatatgaga gaacttcttc 1080ccctaagcct cgattcaaga gctatgccta cacacaggct gcttatgtca ccacctctga 1140ccctacacgg agcccatttc cttcacagca tttggaagct cctgaagaca agtcatttgg 1200cagttcattg atggagagtg aagtaaacct ggaccgttat caaacagctt tagaagaagt 1260attatcgtgg cttctttctg ctgaggacac attgcaagca caaggagaga tttctaatga 1320tgtggaagtg gtgaaagacc agtttcatac tcatgagggg tacatgatgg atttgacagc 1380ccatcagggc cgggttggta atattctaca attgggaagt aagctgattg gaacaggaaa 1440attatcagaa gatgaagaaa ctgaagtaca agagcagatg aatctcctaa attcaagatg 1500ggaatgcctc agggtagcta gcatggaaaa acaaagcaat ttacatcata gattactgca 1560acagttcccc ctggacctgg aaaagtttct tgcctggctt acagaagctg aaacaactgc 1620caatgtccta caggatgcta cccgtaagga aaggctccta gaagactcca agggagtaaa 1680agagctgatg aaacaatggc aagacctcca aggtgaaatt gaagctcaca cagatgttta 1740tcacaacctg gatgaaaaca gccaaaaaat cctgagatcc ctggaaggtt ccgatgatgc 1800agtcctgtta caaagacgtt tggataacat gaacttcaag tggagtgaac ttcggaaaaa 1860gtctctcaac attaggtccc atttggaagc cagttctgac cagtggaagc gtctgcacct 1920ttctctgcag gaacttctgg tgtggctaca gctgaaagat gatgaattaa gccggcaggc 1980acctattgga ggcgactttc cagcagttca gaagcagaac gatgtacata gggccttcaa 2040gagggaattg aaaactaaag aacctgtaat catgagtact cttgagactg tacgaatatt 2100tctgacagag cagcctttgg aaggactaga gaaactctac caggagccca gagagctgcc 2160tcctgaggag agagcccaga atgtcactcg gcttctacga aagcaggctg aggaggtcaa 2220tactgagtgg gaaaaattga acctgcactc cgctgactgg cagagaaaaa tagatgagac 2280ccttgaaaga ctccaggaac ttcaagaggc cacggatgag ctggacctca agctgcgcca 2340agctgaggtg atcaagggat cctggcagcc cgtgggcgat ctcctcattg actctctcca 2400agatcacctc gagaaagtca aggcacttcg aggagaaatt gcgcctctga aagagaacgt 2460gagccacgtc aatgaccttg ctcgccagct taccactttg ggcattcagc tctcaccgta 2520taacctcagc actctggaag acctgaacac cagatggaag cttctgcagg tggccgtcga 2580ggaccgagtc aggcagctgc atgaagccca cagggacttt ggtccagcat ctcagcactt 2640tctttccacg tctgtccagg gtccctggga gagagccatc tcgccaaaca aagtgcccta 2700ctatatcaac cacgagactc aaacaacttg ctgggaccat cccaaaatga cagagctcta 2760ccagtcttta gctgacctga ataatgtcag attctcagct tataggactg ccatgaaact 2820ccgaagactg cagaaggccc tttgcttgga tctcttgagc ctgtcagctg catgtgatgc 2880cttggaccag cacaacctca agcaaaatga ccagcccatg gatatcctgc agattattaa 2940ttgtttgacc actatttatg accgcctgga gcaagagcac aacaatttgg tcaacgtccc 3000tctctgcgtg gatatgtgtc tgaactggct gctgaatgtt tatgatacgg gacgaacagg 3060gaggatccgt gtcctgtctt ttaaaactgg catcatttcc ctgtgtaaag cacatttgga 3120agacaagtac agataccttt tcaagcaagt ggcaagttca acaggatttt gtgaccagcg 3180caggctgggc ctccttctgc atgattctat ccaaattcca agacagttgg gtgaagttgc 3240atcctttggg ggcagtaaca ttgagccaag tgtccggagc tgcttccaat ttgctaataa 3300taagccagag atcgaagcgg ccctcttcct agactggatg agactggaac cccagtccat 3360ggtgtggctg cccgtcctgc acagagtggc tgctgcagaa actgccaagc atcaggccaa 3420atgtaacatc tgcaaagagt gtccaatcat tggattcagg tacaggagtc taaagcactt 3480taattatgac atctgccaaa gctgcttttt ttctggtcga gttgcaaaag gccataaaat 3540gcactatccc atggtggaat attgcactcc gactacatca ggagaagatg ttcgagactt 3600tgccaaggta ctaaaaaaca aatttcgaac caaaaggtat tttgcgaagc atccccgaat 3660gggctacctg ccagtgcaga ctgtcttaga gggggacaac atggaaacgc ctgcctcgtc 3720ccctcagctt tcacacgatg atactcattc acgcattgaa cattatgcta gcaggctagc 3780agaaatggaa aacagcaatg gatcttatct aaatgatagc atctctccta atgagagcat 3840agatgatgaa catttgttaa tccagcatta ctgccaaagt ttgaaccagg actcccccct 3900gagccagcct cgtagtcctg cccagatctt gatttcctta gagagtgagg aaagagggga 3960gctagagaga atcctagcag atcttgagga agaaaacagg aatctgcaag cagaatatga 4020ccgtctaaag cagcagcacg aacataaagg cctgtcccca ctgccgtccc ctcctgaaat 4080gatgcccacc tctccccaga gtccccggga tgctgagctc attgctgagg ccaagctact 4140gcgtcaacac aaaggccgcc tggaagccag gatgcaaatc ctggaagacc acaataaaca 4200gctggagtca cagttacaca ggctaaggca gctgctggag caaccccagg cagaggccaa 4260agtgaatggc acaacggtgt cctctccttc tacctctcta cagaggtccg acagcagtca 4320gcctatgctg ctccgagtgg ttggcagtca aacttcggac tccatgggtg aggaagatct 4380tctcagtcct ccccaggaca caagcacagg gttagaggag gtgatggagc aactcaacaa 4440ctccttccct agttcaagag gaagaaatac ccctggaaag ccaatgagag aggacacaat 4500gtaggaagtc ttttccacat ggcagatgat ttgggcagag cgatggagtc cttagtatca 4560gtcatgacag atgaagaagg agcagaataa atgttttaca actcctgatt cccgcatggt 4620ttttataata ttcatacaac aaagaggatt agacagtaag agtttacaag aaataaatct 4680atatttttgt gaagggtagt ggtattatac tgtagatttc agtagtttct aagtctgtta 4740ttgttttgtt aacaatggca ggttttacac gtctatgcaa ttgtacaaaa aagttataag 4800aaaactacat gtaaaatctt gatagctaaa taacttgcca tttctttata tggaacgcat 4860tttgggttgt ttaaaaattt ataacagtta taaagaaaga ttgtaaacta aagtgtgctt 4920tataaaaaaa agttgtttat aaaaacccct aaaaacaaaa caaacacaca cacacacaca 4980tacacacaca cacacaaaac tttgaggcag cgcattgttt tgcatccttt tggcgtgata 5040tccatatgaa attcatggct ttttcttttt ttgcatatta aagataagac ttcctctacc 5100accacaccaa atgactacta cacactgctc atttgagaac tgtcagctga gtggggcagg 5160cttgagtttt catttcatat atctatatgt ctataagtat ataaatacta tagttatata 5220gataaagaga tacgaatttc tatagactga ctttttccat tttttaaatg ttcatgtcac 5280atcctaatag aaagaaatta cttctagtca gtcatccagg cttacctgct tggtctaga 5339 415462 DNA Artificial Sequence Synthetic 41 gggattccct cactttccccctacaggact cagatctggg aggcaattac cttcggagaa 60 aaacgaatag gaaaaactgaagtgttactt tttttaaagc tgctgaagtt tgttggtttc 120 tcattgtttt taagcctactggagcaataa agtttgaaga acttttacca ggtttttttt 180 atcgctgcct tgatatacacttttcaaaat gctttggtgg gaagaagtag aggactgtta 240 tgaaagagaa gatgttcaaaagaaaacatt cacaaaatgg gtaaatgcac aattttctaa 300 gtttgggaag cagcatattgagaacctctt cagtgaccta caggatggga ggcgcctcct 360 agacctcctc gaaggcctgacagggcaaaa actgccaaaa gaaaaaggat ccacaagagt 420 tcatgccctg aacaatgtcaacaaggcact gcgggttttg cagaacaata atgttgattt 480 agtgaatatt ggaagtactgacatcgtaga tggaaatcat aaactgactc ttggtttgat 540 ttggaatata atcctccactggcaggtcaa aaatgtaatg aaaaatatca tggctggatt 600 gcaacaaacc aacagtgaaaagattctcct gagctgggtc cgacaatcaa ctcgtaatta 660 tccacaggtt aatgtaatcaacttcaccac cagctggtct gatggcctgg ctttgaatgc 720 tctcatccat agtcataggccagacctatt tgactggaat agtgtggttt gccagcagtc 780 agccacacaa cgactggaacatgcattcaa catcgccaga tatcaattag gcatagagaa 840 actactcgat cctgaagatgttgataccac ctatccagat aagaagtcca tcttaatgta 900 catcacatca ctcttccaagttttgcctca acaagtgagc attgaagcca tccaggaagt 960 ggaaatgttg ccaaggccacctaaagtgac taaagaagaa cattttcagt tacatcatca 1020 aatgcactat tctcaacagatcacggtcag tctagcacag ggatatgaga gaacttcttc 1080 ccctaagcct cgattcaagagctatgccta cacacaggct gcttatgtca ccacctctga 1140 ccctacacgg agcccatttccttcacagca tttggaagct cctgaagaca agtcatttgg 1200 cagttcattg atggagagtgaagtaaacct ggaccgttat caaacagctt tagaagaagt 1260 attatcgtgg cttctttctgctgaggacac attgcaagca caaggagaga tttctaatga 1320 tgtggaagtg gtgaaagaccagtttcatac tcatgagggg tacatgatgg atttgacagc 1380 ccatcagggc cgggttggtaatattctaca attgggaagt aagctgattg gaacaggaaa 1440 attatcagaa gatgaagaaactgaagtaca agagcagatg aatctcctaa attcaagatg 1500 ggaatgcctc agggtagctagcatggaaaa acaaagcaat ttacatgctc ctggactgac 1560 cactattgga gcctctcctactcagactgt tactctggtg acacaacctg tggttactaa 1620 ggaaactgcc atctccaaactagaaatgcc atcttccttg atgttggagc atagattact 1680 gcaacagttc cccctggacctggaaaagtt tcttgcctgg cttacagaag ctgaaacaac 1740 tgccaatgtc ctacaggatgctacccgtaa ggaaaggctc ctagaagact ccaagggagt 1800 aaaagagctg atgaaacaatggcaagacct ccaaggtgaa attgaagctc acacagatgt 1860 ttatcacaac ctggatgaaaacagccaaaa aatcctgaga tccctggaag gttccgatga 1920 tgcagtcctg ttacaaagacgtttggataa catgaacttc aagtggagtg aacttcggaa 1980 aaagtctctc aacattaggtcccatttgga agccagttct gaccagtgga agcgtctgca 2040 cctttctctg caggaacttctggtgtggct acagctgaaa gatgatgaat taagccggca 2100 ggcacctatt ggaggcgactttccagcagt tcagaagcag aacgatgtac atagggcctt 2160 caagagggaa ttgaaaactaaagaacctgt aatcatgagt actcttgaga ctgtacgaat 2220 atttctgaca gagcagcctttggaaggact agagaaactc taccaggagc ccagagagct 2280 gcctcctgag gagagagcccagaatgtcac tcggcttcta cgaaagcagg ctgaggaggt 2340 caatactgag tgggaaaaattgaacctgca ctccgctgac tggcagagaa aaatagatga 2400 gacccttgaa agactccaggaacttcaaga ggccacggat gagctggacc tcaagctgcg 2460 ccaagctgag gtgatcaagggatcctggca gcccgtgggc gatctcctca ttgactctct 2520 ccaagatcac ctcgagaaagtcaaggcact tcgaggagaa attgcgcctc tgaaagagaa 2580 cgtgagccac gtcaatgaccttgctcgcca gcttaccact ttgggcattc agctctcacc 2640 gtataacctc agcactctggaagacctgaa caccagatgg aagcttctgc aggtggccgt 2700 cgaggaccga gtcaggcagctgcatgaagc ccacagggac tttggtccag catctcagca 2760 ctttctttcc acgtctgtccagggtccctg ggagagagcc atctcgccaa acaaagtgcc 2820 ctactatatc aaccacgagactcaaacaac ttgctgggac catcccaaaa tgacagagct 2880 ctaccagtct ttagctgacctgaataatgt cagattctca gcttatagga ctgccatgaa 2940 actccgaaga ctgcagaaggccctttgctt ggatctcttg agcctgtcag ctgcatgtga 3000 tgccttggac cagcacaacctcaagcaaaa tgaccagccc atggatatcc tgcagattat 3060 taattgtttg accactatttatgaccgcct ggagcaagag cacaacaatt tggtcaacgt 3120 ccctctctgc gtggatatgtgtctgaactg gctgctgaat gtttatgata cgggacgaac 3180 agggaggatc cgtgtcctgtcttttaaaac tggcatcatt tccctgtgta aagcacattt 3240 ggaagacaag tacagataccttttcaagca agtggcaagt tcaacaggat tttgtgacca 3300 gcgcaggctg ggcctccttctgcatgattc tatccaaatt ccaagacagt tgggtgaagt 3360 tgcatccttt gggggcagtaacattgagcc aagtgtccgg agctgcttcc aatttgctaa 3420 taataagcca gagatcgaagcggccctctt cctagactgg atgagactgg aaccccagtc 3480 catggtgtgg ctgcccgtcctgcacagagt ggctgctgca gaaactgcca agcatcaggc 3540 caaatgtaac atctgcaaagagtgtccaat cattggattc aggtacagga gtctaaagca 3600 ctttaattat gacatctgccaaagctgctt tttttctggt cgagttgcaa aaggccataa 3660 aatgcactat cccatggtggaatattgcac tccgactaca tcaggagaag atgttcgaga 3720 ctttgccaag gtactaaaaaacaaatttcg aaccaaaagg tattttgcga agcatccccg 3780 aatgggctac ctgccagtgcagactgtctt agagggggac aacatggaaa cgcctgcctc 3840 gtcccctcag ctttcacacgatgatactca ttcacgcatt gaacattatg ctagcaggct 3900 agcagaaatg gaaaacagcaatggatctta tctaaatgat agcatctctc ctaatgagag 3960 catagatgat gaacatttgttaatccagca ttactgccaa agtttgaacc aggactcccc 4020 cctgagccag cctcgtagtcctgcccagat cttgatttcc ttagagagtg aggaaagagg 4080 ggagctagag agaatcctagcagatcttga ggaagaaaac aggaatctgc aagcagaata 4140 tgaccgtcta aagcagcagcacgaacataa aggcctgtcc ccactgccgt cccctcctga 4200 aatgatgccc acctctccccagagtccccg ggatgctgag ctcattgctg aggccaagct 4260 actgcgtcaa cacaaaggccgcctggaagc caggatgcaa atcctggaag accacaataa 4320 acagctggag tcacagttacacaggctaag gcagctgctg gagcaacccc aggcagaggc 4380 caaagtgaat ggcacaacggtgtcctctcc ttctacctct ctacagaggt ccgacagcag 4440 tcagcctatg ctgctccgagtggttggcag tcaaacttcg gactccatgg gtgaggaaga 4500 tcttctcagt cctccccaggacacaagcac agggttagag gaggtgatgg agcaactcaa 4560 caactccttc cctagttcaagaggaagaaa tacccctgga aagccaatga gagaggacac 4620 aatgtaggaa gtcttttccacatggcagat gatttgggca gagcgatgga gtccttagta 4680 tcagtcatga cagatgaagaaggagcagaa taaatgtttt acaactcctg attcccgcat 4740 ggtttttata atattcatacaacaaagagg attagacagt aagagtttac aagaaataaa 4800 tctatatttt tgtgaagggtagtggtatta tactgtagat ttcagtagtt tctaagtctg 4860 ttattgtttt gttaacaatggcaggtttta cacgtctatg caattgtaca aaaaagttat 4920 aagaaaacta catgtaaaatcttgatagct aaataacttg ccatttcttt atatggaacg 4980 cattttgggt tgtttaaaaatttataacag ttataaagaa agattgtaaa ctaaagtgtg 5040 ctttataaaa aaaagttgtttataaaaacc cctaaaaaca aaacaaacac acacacacac 5100 acatacacac acacacacaaaactttgagg cagcgcattg ttttgcatcc ttttggcgtg 5160 atatccatat gaaattcatggctttttctt tttttgcata ttaaagataa gacttcctct 5220 accaccacac caaatgactactacacactg ctcatttgag aactgtcagc tgagtggggc 5280 aggcttgagt tttcatttcatatatctata tgtctataag tatataaata ctatagttat 5340 atagataaag agatacgaatttctatagac tgactttttc cattttttaa atgttcatgt 5400 cacatcctaa tagaaagaaattacttctag tcagtcatcc aggcttacct gcttggtcta 5460 ga 5462 42 8689 DNAArtificial Sequence Synthetic 42 gggattccct cactttcccc ctacaggactcagatctggg aggcaattac cttcggagaa 60 aaacgaatag gaaaaactga agtgttactttttttaaagc tgctgaagtt tgttggtttc 120 tcattgtttt taagcctact ggagcaataaagtttgaaga acttttacca ggtttttttt 180 atcgctgcct tgatatacac ttttcaaaatgctttggtgg gaagaagtag aggactgtta 240 tgaaagagaa gatgttcaaa agaaaacattcacaaaatgg gtaaatgcac aattttctaa 300 gtttgggaag cagcatattg agaacctcttcagtgaccta caggatggga ggcgcctcct 360 agacctcctc gaaggcctga cagggcaaaaactgccaaaa gaaaaaggat ccacaagagt 420 tcatgccctg aacaatgtca acaaggcactgcgggttttg cagaacaata atgttgattt 480 agtgaatatt ggaagtactg acatcgtagatggaaatcat aaactgactc ttggtttgat 540 ttggaatata atcctccact ggcaggtcaaaaatgtaatg aaaaatatca tggctggatt 600 gcaacaaacc aacagtgaaa agattctcctgagctgggtc cgacaatcaa ctcgtaatta 660 tccacaggtt aatgtaatca acttcaccaccagctggtct gatggcctgg ctttgaatgc 720 tctcatccat agtcataggc cagacctatttgactggaat agtgtggttt gccagcagtc 780 agccacacaa cgactggaac atgcattcaacatcgccaga tatcaattag gcatagagaa 840 actactcgat cctgaagatg ttgataccacctatccagat aagaagtcca tcttaatgta 900 catcacatca ctcttccaag ttttgcctcaacaagtgagc attgaagcca tccaggaagt 960 ggaaatgttg ccaaggccac ctaaagtgactaaagaagaa cattttcagt tacatcatca 1020 aatgcactat tctcaacaga tcacggtcagtctagcacag ggatatgaga gaacttcttc 1080 ccctaagcct cgattcaaga gctatgcctacacacaggct gcttatgtca ccacctctga 1140 ccctacacgg agcccatttc cttcacagcatttggaagct cctgaagaca agtcatttgg 1200 cagttcattg atggagagtg aagtaaacctggaccgttat caaacagctt tagaagaagt 1260 attatcgtgg cttctttctg ctgaggacacattgcaagca caaggagaga tttctaatga 1320 tgtggaagtg gtgaaagacc agtttcatactcatgagggg tacatgatgg atttgacagc 1380 ccatcagggc cgggttggta atattctacaattgggaagt aagctgattg gaacaggaaa 1440 attatcagaa gatgaagaaa ctgaagtacaagagcagatg aatctcctaa attcaagatg 1500 ggaatgcctc agggtagcta gcatggaaaaacaaagcaat ttacatagag ttttaatgga 1560 tctccagaat cagaaactga aagagttgaatgactggcta acaaaaacag aagaaagaac 1620 aaggaaaatg gaggaagagc ctcttggacctgatcttgaa gacctaaaac gccaagtaca 1680 acaacataag gtgcttcaag aagatctagaacaagaacaa gtcagggtca attctctcac 1740 tcacatggtg gtggtagttg atgaatctagtggagatcac gcaactgctg ctttggaaga 1800 acaacttaag gtattgggag atcgatgggcaaacatctgt agatggacag aagaccgctg 1860 ggttctttta caagacatcc ttctcaaatggcaacgtctt actgaagaac agtgcctttt 1920 tagtgcatgg ctttcagaaa aagaagatgcagtgaacaag attcacacaa ctggctttaa 1980 agatcaaaat gaaatgttat caagtcttcaaaaactggcc gttttaaaag cggatctaga 2040 aaagaaaaag caatccatgg gcaaactgtattcactcaaa caagatcttc tttcaacact 2100 gaagaataag tcagtgaccc agaagacggaagcatggctg gataactttg cccggtgttg 2160 ggataattta gtccaaaaac ttgaaaagagtacagcacag atttcacagc agcctgacct 2220 agctcctgga ctgaccacta ttggagcctctcctactcag actgttactc tggtgacaca 2280 acctgtggtt actaaggaaa ctgccatctccaaactagaa atgccatctt ccttgatgtt 2340 ggaggtacct gctctggcag atttcaaccgggcttggaca gaacttaccg actggctttc 2400 tctgcttgat caagttataa aatcacagagggtgatggtg ggtgaccttg aggatatcaa 2460 cgagatgatc atcaagcaga aggcaacaatgcaggatttg gaacagaggc gtccccagtt 2520 ggaagaactc attaccgctg cccaaaatttgaaaaacaag accagcaatc aagaggctag 2580 aacaatcatt acggatcgaa ttgaaagaattcagaatcag tgggatgaag tacaagaaca 2640 ccttcagaac cggaggcaac agttgaatgaaatgttaaag gattcaacac aatggctgga 2700 agctaaggaa gaagctgagc aggtcttaggacaggccaga gccaagcttg agtcatggaa 2760 ggagggtccc tatacagtag atgcaatccaaaagaaaatc acagaaacca agcagttggc 2820 caaagacctc cgccagtggc agacaaatgtagatgtggca aatgacttgg ccctgaaact 2880 tctccgggat tattctgcag atgataccagaaaagtccac atgataacag agaatatcaa 2940 tgcctcttgg agaagcattc ataaaagggtgagtgagcga gaggctgctt tggaagaaac 3000 tcatagatta ctgcaacagt tccccctggacctggaaaag tttcttgcct ggcttacaga 3060 agctgaaaca actgccaatg tcctacaggatgctacccgt aaggaaaggc tcctagaaga 3120 ctccaaggga gtaaaagagc tgatgaaacaatggcaagac ctccaaggtg aaattgaagc 3180 tcacacagat gtttatcaca acctggatgaaaacagccaa aaaatcctga gatccctgga 3240 aggttccgat gatgcagtcc tgttacaaagacgtttggat aacatgaact tcaagtggag 3300 tgaacttcgg aaaaagtctc tcaacattaggtcccatttg gaagccagtt ctgaccagtg 3360 gaagcgtctg cacctttctc tgcaggaacttctggtgtgg ctacagctga aagatgatga 3420 attaagccgg caggcaccta ttggaggcgactttccagca gttcagaagc agaacgatgt 3480 acatagggcc ttcaagaggg aattgaaaactaaagaacct gtaatcatga gtactcttga 3540 gactgtacga atatttctga cagagcagcctttggaagga ctagagaaac tctaccagga 3600 gcccagagag ctgcctcctg aggagagagcccagaatgtc actcggcttc tacgaaagca 3660 ggctgaggag gtcaatactg agtgggaaaaattgaacctg cactccgctg actggcagag 3720 aaaaatagat gagacccttg aaagactccaggaacttcaa gaggccacgg atgagctgga 3780 cctcaagctg cgccaagctg aggtgatcaagggatcctgg cagcccgtgg gcgatctcct 3840 cattgactct ctccaagatc acctcgagaaagtcaaggca cttcgaggag aaattgcgcc 3900 tctgaaagag aacgtgagcc acgtcaatgaccttgctcgc cagcttacca ctttgggcat 3960 tcagctctca ccgtataacc tcagcactctggaagacctg aacaccagat ggaagcttct 4020 gcaggtggcc gtcgaggacc gagtcaggcagctgcatgaa gcccacaggg actttggtcc 4080 agcatctcag cactttcttt ccacgtctgtccagggtccc tgggagagag ccatctcgcc 4140 aaacaaagtg ccctactata tcaaccacgagactcaaaca acttgctggg accatcccaa 4200 aatgacagag ctctaccagt ctttagctgacctgaataat gtcagattct cagcttatag 4260 gactgccatg aaactccgaa gactgcagaaggccctttgc ttggatctct tgagcctgtc 4320 agctgcatgt gatgccttgg accagcacaacctcaagcaa aatgaccagc ccatggatat 4380 cctgcagatt attaattgtt tgaccactatttatgaccgc ctggagcaag agcacaacaa 4440 tttggtcaac gtccctctct gcgtggatatgtgtctgaac tggctgctga atgtttatga 4500 tacgggacga acagggagga tccgtgtcctgtcttttaaa actggcatca tttccctgtg 4560 taaagcacat ttggaagaca agtacagataccttttcaag caagtggcaa gttcaacagg 4620 attttgtgac cagcgcaggc tgggcctccttctgcatgat tctatccaaa ttccaagaca 4680 gttgggtgaa gttgcatcct ttgggggcagtaacattgag ccaagtgtcc ggagctgctt 4740 ccaatttgct aataataagc cagagatcgaagcggccctc ttcctagact ggatgagact 4800 ggaaccccag tccatggtgt ggctgcccgtcctgcacaga gtggctgctg cagaaactgc 4860 caagcatcag gccaaatgta acatctgcaaagagtgtcca atcattggat tcaggtacag 4920 gagtctaaag cactttaatt atgacatctgccaaagctgc tttttttctg gtcgagttgc 4980 aaaaggccat aaaatgcact atcccatggtggaatattgc actccgacta catcaggaga 5040 agatgttcga gactttgcca aggtactaaaaaacaaattt cgaaccaaaa ggtattttgc 5100 gaagcatccc cgaatgggct acctgccagtgcagactgtc ttagaggggg acaacatgga 5160 aactcccgtt actctgatca acttctggccagtagattct gcgcctgcct cgtcccctca 5220 gctttcacac gatgatactc attcacgcattgaacattat gctagcaggc tagcagaaat 5280 ggaaaacagc aatggatctt atctaaatgatagcatctct cctaatgaga gcatagatga 5340 tgaacatttg ttaatccagc attactgccaaagtttgaac caggactccc ccctgagcca 5400 gcctcgtagt cctgcccaga tcttgatttccttagagagt gaggaaagag gggagctaga 5460 gagaatccta gcagatcttg aggaagaaaacaggaatctg caagcagaat atgaccgtct 5520 aaagcagcag cacgaacata aaggcctgtccccactgccg tcccctcctg aaatgatgcc 5580 cacctctccc cagagtcccc gggatgctgagctcattgct gaggccaagc tactgcgtca 5640 acacaaaggc cgcctggaag ccaggatgcaaatcctggaa gaccacaata aacagctgga 5700 gtcacagtta cacaggctaa ggcagctgctggagcaaccc caggcagagg ccaaagtgaa 5760 tggcacaacg gtgtcctctc cttctacctctctacagagg tccgacagca gtcagcctat 5820 gctgctccga gtggttggca gtcaaacttcggactccatg ggtgaggaag atcttctcag 5880 tcctccccag gacacaagca cagggttagaggaggtgatg gagcaactca acaactcctt 5940 ccctagttca agaggaagaa atacccctggaaagccaatg agagaggaca caatgtagga 6000 agtcttttcc acatggcaga tgatttgggcagagcgatgg agtccttagt atcagtcatg 6060 acagatgaag aaggagcaga ataaatgttttacaactcct gattcccgca tggtttttat 6120 aatattcata caacaaagag gattagacagtaagagttta caagaaataa atctatattt 6180 ttgtgaaggg tagtggtatt atactgtagatttcagtagt ttctaagtct gttattgttt 6240 tgttaacaat ggcaggtttt acacgtctatgcaattgtac aaaaaagtta taagaaaact 6300 acatgtaaaa tcttgatagc taaataacttgccatttctt tatatggaac gcattttggg 6360 ttgtttaaaa atttataaca gttataaagaaagattgtaa actaaagtgt gctttataaa 6420 aaaaagttgt ttataaaaac ccctaaaaacaaaacaaaca cacacacaca cacatacaca 6480 cacacacaca aaactttgag gcagcgcattgttttgcatc cttttggcgt gatatccata 6540 tgaaattcat ggctttttct ttttttgcatattaaagata agacttcctc taccaccaca 6600 ccaaatgact actacacact gctcatttgagaactgtcag ctgagtgggg caggcttgag 6660 ttttcatttc atatatctat atgtctataagtatataaat actatagtta tatagataaa 6720 gagatacgaa tttctataga ctgactttttccatttttta aatgttcatg tcacatccta 6780 atagaaagaa attacttcta gtcagtcatccaggcttacc tgcttggtct agaatggatt 6840 tttcccggag ccggaagcca ggaggaaactacaccacact aaaacattgt ctacagctcc 6900 agatgtttct cattttaaac aactttccactgacaacgaa agtaaagtaa agtattggat 6960 ttttttaaag ggaacatgtg aatgaatacacaggacttat tatatcagag tgagtaatcg 7020 gttggttggt tgattgattg attgattgatacattcagct tcctgctgct agcaatgcca 7080 cgatttagat ttaatgatgc ttcagtggaaatcaatcaga aggtattctg accttgtgaa 7140 catcagaagg tattttttaa ctcccaagcagtagcaggac gatgataggg ctggagggct 7200 atggattccc agcccatccc tgtgaaggagtaggccactc tttaagtgaa ggattggatg 7260 attgttcata atacataaag ttctctgtaattacaactaa attattatgc cctcttctca 7320 cagtcaaaag gaactgggtg gtttggtttttgttgctttt ttagatttat tgtcccatgt 7380 gggatgagtt tttaaatgcc acaagacataatttaaaata aataaacttt gggaaaaggt 7440 gtaagacagt agccccatca catttgtgatactgacaggt atcaacccag aagcccatga 7500 actgtgtttc catcctttgc atttctctgcgagtagttcc acacaggttt gtaagtaagt 7560 aagaaagaag gcaaattgat tcaaatgttacaaaaaaacc cttcttggtg gattagacag 7620 gttaaatata taaacaaaca aacaaaaattgctcaaaaaa gaggagaaaa gctcaagagg 7680 aaaagctaag gactggtagg aaaaagctttactctttcat gccattttat ttctttttga 7740 tttttaaatc attcattcaa tagataccaccgtgtgacct ataattttgc aaatctgtta 7800 cctctgacat caagtgtaat tagcttttggagagtgggct gacatcaagt gtaattagct 7860 tttggagagt gggttttgtc cattattaataattaattaa ttaacatcaa acacggcttc 7920 tcatgctatt tctacctcac tttggttttggggtgttcct gataattgtg cacacctgag 7980 ttcacagctt caccacttgt ccattgcgttattttctttt tcctttataa ttctttcttt 8040 ttccttcata attttcaaaa gaaaacccaaagctctaagg taacaaatta ccaaattaca 8100 tgaagatttg gtttttgtct tgcatttttttcctttatgt gacgctggac cttttcttta 8160 cccaaggatt tttaaaactc agatttaaaacaaggggtta ctttacatcc tactaagaag 8220 tttaagtaag taagtttcat tctaaaatcagaggtaaata gagtgcataa ataattttgt 8280 tttaatcttt ttgtttttct tttagacacattagctctgg agtgagtctg tcataatatt 8340 tgaacaaaaa ttgagagctt tattgctgcattttaagcat aattaatttg gacattattt 8400 cgtgttgtgt tctttataac caccgagtattaaactgtaa atcataatgt aactgaagca 8460 taaacatcac atggcatgtt ttgtcattgttttcaggtac tgagttctta cttgagtatc 8520 ataatatatt gtgttttaac accaacactgtaacatttac gaattatttt tttaaacttc 8580 agttttactg cattttcaca acatatcagacttcaccaaa tatatgcctt actattgtat 8640 tatagtactg ctttactgtg tatctcaataaagcacgcag ttatgttac 8689 43 4181 DNA Homo sapiens 43 ggaactccgcttcgcccgag acccagcgcc caggcgtgtc gcccgagagg agccgcgcga 60 aggtcaccccgcgcccgccg cccgccgccc gccgcctccg tgggtccgtt tgccagtcag 120 cccgtgcgtccgagcccctc gcgccccgcc gcagccccgg ccaaccgagc gccatgaacc 180 agatagagcccggcgtgcag tacaactacg tgtacgacga ggatgagtac atgatccagg 240 aggaggagtgggaccgcgac ctgctcctgg acccagcctg ggagaagcag cagaggaaga 300 ccttcactgcctggtgtaac tcccacctaa ggaaagccgg cacccagatt gagaacatcg 360 aggaagacttcaggaatggc cttaagctca tgctgctttt ggaagtcatc tcaggggaaa 420 ggctgcccaaacctgaccgg ggaaaaatgc ggttccacaa aattgctaat gtcaacaaag 480 ctttggattacatagccagc aaaggggtga aactggtgtc catcggcgct gaagaaattg 540 ttgatggcaatgtgaaaatg accctgggta tgatctggac catcatcctt cgctttgcta 600 ttcaggatatttcggttgaa gaaacatctg ccaaagaagg tctgctgctt tggtgtcaga 660 ggaaaactgctccttataga aatgtgaaca ttcagaactt ccatactagc tggaaagatg 720 gccttggactctgtgccctc atccaccgac accggcctga cctcattgac tactcaaagc 780 ttaacaaggatgaccccata ggaaatatta acctggccat ggaaatcgct gagaagcacc 840 tggatattcctaaaatgttg gatgctgaag acatcgtgaa cacccctaaa cccgatgaaa 900 gagccatcatgacgtacgtc tcttgcttct accacgcttt tgcgggcgcg gagcaggccg 960 agacagcggctaacaggata tgtaaggttc ttgctgtgaa tcaagagaat gagaggctga 1020 tggaagaatatgagaggcta gcgagtgagc ttttggaatg gattcgtcgc acgatcccct 1080 ggctggagaaccggactccc gagaagacca tgcaagccat gcagaagaag ctggaggact 1140 tccgggattaccgccggaag cacaagccac ccaaggtgca ggagaaatgc cagctggaga 1200 tcaacttcaacacgctgcag accaagctgc ggatcagcaa ccgtcctgcc ttcatgccct 1260 ccgagggcaagatggtgtcg gatattgctg gtgcctggca gaggctggag caggctgaga 1320 agggttacgaggagtggttg ctcaatgaga ttcggagact ggagcgcttg gaacacctgg 1380 ctgagaagttcaggcagaag gcctcaacgc acgagacttg ggcttatggc aaagagcaga 1440 tcttgctgcagaaggattac gagtcggcgt cgctgacaga ggtgcgggct ctgctgcgga 1500 agcacgaggcgttcgagagc gacctggcag cgcaccagga ccgcgtggag cagatcgcag 1560 ccatcgcgcaggagctcaat gaactggact atcacgacgc tgtgaatgtc aatgatcggt 1620 gccagaaaatttgtgaccag tgggaccgac tgggaacgct tactcagaag aggagagaag 1680 ccctagagagaatggagaaa ttgctagaaa ccattgatca gcttcacctg gagtttgcca 1740 agagggctgctcctttcaac aattggatgg agggcgctat ggaggatctg caagatatgt 1800 tcattgtccacagcattgag gagatccaga gtctgatcac tgcgcatgag cagttcaagg 1860 ccacgctgcccgaggcggac ggagagcggc agtccatcat ggccatccag aacgaggtgg 1920 agaaggtgattcagagctac aacatcagaa tcagctcaag caacccgtac agcactgtca 1980 ccatggatgagctccggacc aagtgggaca aggtgaagca actcgtgccc atccgcgatc 2040 aatccctgcaggaggagctg gctcgccagc atgctaacga gcgtctgagg cgccagtttg 2100 ctgcccaagccaatgccatt gggccctgga tccagaacaa gatggaggag attgcccgga 2160 gctccatccagatcacagga gccctggaag accagatgaa ccagctgaag cagtatgagc 2220 acaacatcatcaactataag aacaacatcg acaagctgga gggagaccat cagctcatcc 2280 aggaggcccttgtctttgac aacaagcaca cgaactacac gatggagcac attcgtgttg 2340 gatgggagctgctgctgaca accatcgcca gaaccatcaa tgaggtggag actcagatcc 2400 tgacgagagatgcgaagggc atcacccagg agcagatgaa tgagttcaga gcctccttca 2460 accactttgacaggaggaag aatggcctga tggatcatga ggatttcaga gcctgcctga 2520 tttccatgggttatgacctg ggtgaagccg aatttgcccg cattatgacc ctggtagatc 2580 ccaacgggcaaggcaccgtc accttccaat ccttcatcga cttcatgact agagagacgg 2640 ctgacaccgacactgccgag caggtcatcg cctccttccg gatcctggct tctgataagc 2700 catacatcctggcggaggag ctgcgtcggg agctgccccc ggatcaggcc cagtactgca 2760 tcaagaggatgcccgcctac tcgggcccag gcagtgtgcc tggtgcactg gattacgctg 2820 cgttctcttccgcactctac ggggagagcg atctgtgatg ctgagcttct gtaatcactc 2880 atcccatcagaatgcaataa aagcggaagt cacagtttgt ttcctggaaa ctttgacaag 2940 ctttattaagttgagagaga gagaggggga aaaaaaaaaa gcctttcgta gttcagtaat 3000 tgccagcaatataacacggc taaaatgaag tttttacagt atatgacata gtgcgcttca 3060 taaataggtttatttctgag tttttagcaa aatgtaatga aatatcaggt tgatttcttt 3120 gattaaacagaacaaattac ttgagtaata ggaaattagg aggatctagg gacagaagga 3180 aagtgaaaaatgtgaaaata caaaataccc aagatttaag accgggggga aaaaaccaca 3240 aattggtaaataaaggtttg ctatttgtaa aaaatttcat ttatctctaa tatgcttatg 3300 tgattggccctaggggagta tatttgggat tctaatgttt tattttcatg cttatccaaa 3360 gattactattgtatcttcaa atgaacttaa tattgtgaga tggaactgcc ggggattaaa 3420 aagactacccaaaagatttt tggcacttac aatttttaaa atagtttatg tcatctcttc 3480 attatttagggctggatggt caactcagtc agtgattttt tgatgcttct cttatcctcc 3540 agaatagagacctaaggaca cgtggaagtc agtttaattg ccagagagaa ggatgcaatc 3600 actaggtgaaatgaggtttt taggattatt tattgattcc aggttcccat gctttttgtt 3660 agagcttattagtacaggtt ctcaagagat gaccacataa aagtgctctg tttataaata 3720 agcaggtttctgtagtactg actggttcat cacaaggcaa gtcagaaacc agtatccttc 3780 tagctctccagtcaggactt ccttatgcct ctagttttat gaccggttaa ggagaagcca 3840 gagttagagtaggagaggac taattctcag cagcagtgga ggtgagttct ttcttttgcg 3900 gaagctttacatatgttttg tgtagtagga ataactagat attttagcta gtgtgcggtg 3960 tgtgttcacccctgggattg gacagtgtat cctaacaagt cccatgtctg gttctgtgtc 4020 taaaggcctgctccatgaca caggatgcta catgcactcc tgctagcaca tcttgatctg 4080 ttgaatgttcattctttctt tttgctcata ctgctgtagg ctataattcc cccctgtttt 4140 tccatcttgttgacagcttg tagagaataa agcaggaatt c 4181 44 11443 DNA Artificial SequenceSynthetic 44 gggattccct cactttcccc ctacaggact cagatctggg aggcaattaccttcggagaa 60 aaacgaatag gaaaaactga agtgttactt tttttaaagc tgctgaagtttgttggtttc 120 tcattgtttt taagcctact ggagcaataa agtttgaaga acttttaccaggtttttttt 180 atcgctgcct tgatatacac ttttcaaaat gctttggtgg gaagaagtagaggactgtta 240 tgaaagagaa gatgttcaaa agaaaacatt cacaaaatgg gtaaatgcacaattttctaa 300 gtttgggaag cagcatattg agaacctctt cagtgaccta caggatgggaggcgcctcct 360 agacctcctc gaaggcctga cagggcaaaa actgccaaaa gaaaaaggatccacaagagt 420 tcatgccctg aacaatgtca acaaggcact gcgggttttg cagaacaataatgttgattt 480 agtgaatatt ggaagtactg acatcgtaga tggaaatcat aaactgactcttggtttgat 540 ttggaatata atcctccact ggcaggtcaa aaatgtaatg aaaaatatcatggctggatt 600 gcaacaaacc aacagtgaaa agattctcct gagctgggtc cgacaatcaactcgtaatta 660 tccacaggtt aatgtaatca acttcaccac cagctggtct gatggcctggctttgaatgc 720 tctcatccat agtcataggc cagacctatt tgactggaat agtgtggtttgccagcagtc 780 agccacacaa cgactggaac atgcattcaa catcgccaga tatcaattaggcatagagaa 840 actactcgat cctgaagatg ttgataccac ctatccagat aagaagtccatcttaatgta 900 catcacatca ctcttccaag ttttgcctca acaagtgagc attgaagccatccaggaagt 960 ggaaatgttg ccaaggccac ctaaagtgac taaagaagaa cattttcagttacatcatca 1020 aatgcactat tctcaacaga tcacggtcag tctagcacag ggatatgagagaacttcttc 1080 ccctaagcct cgattcaaga gctatgccta cacacaggct gcttatgtcaccacctctga 1140 ccctacacgg agcccatttc cttcacagca tttggaagct cctgaagacaagtcatttgg 1200 cagttcattg atggagagtg aagtaaacct ggaccgttat caaacagctttagaagaagt 1260 attatcgtgg cttctttctg ctgaggacac attgcaagca caaggagagatttctaatga 1320 tgtggaagtg gtgaaagacc agtttcatac tcatgagggg tacatgatggatttgacagc 1380 ccatcagggc cgggttggta atattctaca attgggaagt aagctgattggaacaggaaa 1440 attatcagaa gatgaagaaa ctgaagtaca agagcagatg aatctcctaaattcaagatg 1500 ggaatgcctc agggtagcta gcatggaaaa acaaagcaat ttacatagagttttaatgga 1560 tctccagaat cagaaactga aagagttgaa tgactggcta acaaaaacagaagaaagaac 1620 aaggaaaatg gaggaagagc ctcttggacc tgatcttgaa gacctaaaacgccaagtaca 1680 acaacataag gtgcttcaag aagatctaga acaagaacaa gtcagggtcaattctctcac 1740 tcacatggtg gtggtagttg atgaatctag tggagatcac gcaactgctgctttggaaga 1800 acaacttaag gtattgggag atcgatgggc aaacatctgt agatggacagaagaccgctg 1860 ggttctttta caagacatcc ttctcaaatg gcaacgtctt actgaagaacagtgcctttt 1920 tagtgcatgg ctttcagaaa aagaagatgc agtgaacaag attcacacaactggctttaa 1980 agatcaaaat gaaatgttat caagtcttca aaaactggcc gttttaaaagcggatctaga 2040 aaagaaaaag caatccatgg gcaaactgta ttcactcaaa caagatcttctttcaacact 2100 gaagaataag tcagtgaccc agaagacgga agcatggctg gataactttgcccggtgttg 2160 ggataattta gtccaaaaac ttgaaaagag tacagcacag atttcacaggctgtcaccac 2220 cactcagcca tcactaacac agacaactgt aatggaaaca gtaactacggtgaccacaag 2280 ggaacagatc ctggtaaagc atgctcaaga ggaacttcca ccaccacctccccaaaagaa 2340 gaggcagatt actgtggatt ctgaaattag gaaaaggttg gatgttgatataactgaact 2400 tcacagctgg attactcgct cagaagctgt gttgcagagt cctgaatttgcaatctttcg 2460 gaaggaaggc aacttctcag acttaaaaga aaaagtcaat gccatagagcgagaaaaagc 2520 tgagaagttc agaaaactgc aagatgccag cagatcagct caggccctggtggaacagat 2580 ggtgaatgag ggtgttaatg cagatagcat caaacaagcc tcagaacaactgaacagccg 2640 gtggatcgaa ttctgccagt tgctaagtga gagacttaac tggctggagtatcagaacaa 2700 catcatcgct ttctataatc agctacaaca attggagcag atgacaactactgctgaaaa 2760 ctggttgaaa atccaaccca ccaccccatc agagccaaca gcaattaaaagtcagttaaa 2820 aatttgtaag gatgaagtca accggctatc aggtcttcaa cctcaaattgaacgattaaa 2880 aattcaaagc atagccctga aagagaaagg acaaggaccc atgttcctggatgcagactt 2940 tgtggccttt acaaatcatt ttaagcaagt cttttctgat gtgcaggccagagagaaaga 3000 gctacagaca atttttgaca ctttgccacc aatgcgctat caggagaccatgagtgccat 3060 caggacatgg gtccagcagt cagaaaccaa actctccata cctcaacttagtgtcaccga 3120 ctatgaaatc atggagcaga gactcgggga attgcaggct ttacaaagttctctgcaaga 3180 gcaacaaagt ggcctatact atctcagcac cactgtgaaa gagatgtcgaagaaagcgcc 3240 ctctgaaatt agccggaaat atcaatcaga atttgaagaa attgagggacgctggaagaa 3300 gctctcctcc cagctggttg agcattgtca aaagctagag gagcaaatgaataaactccg 3360 aaaaattcag aatcacatac aaaccctgaa gaaatggatg gctgaagttgatgtttttct 3420 gaaggaggaa tggcctgccc ttggggattc agaaattcta aaaaagcagctgaaacagtg 3480 cagactttta gtcagtgata ttcagacaat tcagcccagt ctaaacagtgtcaatgaagg 3540 tgggcagaag ataaagaatg aagcagagcc agagtttgct tcgagacttgagacagaact 3600 caaagaactt aacactcagt gggatcacat gtgccaacag gtctatgccagaaaggaggc 3660 cttgaaggga ggtttggaga aaactgtaag cctccagaaa gatctatcagagatgcacga 3720 atggatgaca caagctgaag aagagtatct tgagagagat tttgaatataaaactccaga 3780 tgaattacag aaagcagttg aagagatgaa gagagctaaa gaagaggcccaacaaaaaga 3840 agcgaaagtg aaactcctta ctgagtctgt aaatagtgtc atagctcaagctccacctgt 3900 agcacaagag gccttaaaaa aggaacttga aactctaacc accaactaccagtggctctg 3960 cactaggctg aatgggaaat gcaagacttt ggaagaatct gttgagaaatggcggcgttt 4020 tcattatgat ataaagatat ttaatcagtg gctaacagaa gctgaacagtttctcagaaa 4080 gacacaaatt cctgagaatt gggaacatgc taaatacaaa tggtatcttaaggaactcca 4140 ggatggcatt gggcagcggc aaactgttgt cagaacattg aatgcaactggggaagaaat 4200 aattcagcaa tcctcaaaaa cagatgccag tattctacag gaaaaattgggaagcctgaa 4260 tctgcggtgg caggaggtct gcaaacagct gtcagacaga aaaaagaggctagaagaaca 4320 aaagaatatc ttgtcagaat ttcaaagaga tttaaatgaa tttgttttatggttggagga 4380 agcagataac attgctagta tcccacttga acctggaaaa gagcagcaactaaaagaaaa 4440 gcttgagcaa gtcaagttac tggtggaaga gttgcccctg cgccagggaattctcaaaca 4500 attaaatgaa actggaggac ccgtgcttgt aagtgctccc ataagcccagaagagcaaga 4560 taaacttgaa aataagctca agcagacaaa tctccagtgg ataaaggtttccagagcttt 4620 acctgagaaa caaggagaaa ttgaagctca aataaaagac cttgggcagcttgaaaaaaa 4680 gcttgaagac cttgaagagc agttaaatca tctgctgctg tggttatctcctattaggaa 4740 tcagttggaa atttataacc aaccaaacca agaaggacca tttgacgttcaggaaactga 4800 aatagcagtt caagctaaac aaccggatgt ggaagagatt ttgtctaaagggcagcattt 4860 gtacaaggaa aaaccagcca ctcagccagt gaagaggaag ttagaagatctgagctctga 4920 gtggaaggcg gtaaaccgtt tacttcaaga gctgagggca aagcagcctgacctagctcc 4980 tggactgacc actattggag cctctcctac tcagactgtt actctggtgacacaacctgt 5040 ggttactaag gaaactgcca tctccaaact agaaatgcca tcttccttgatgttggaggt 5100 acctgctctg gcagatttca accgggcttg gacagaactt accgactggctttctctgct 5160 tgatcaagtt ataaaatcac agagggtgat ggtgggtgac cttgaggatatcaacgagat 5220 gatcatcaag cagaaggcaa caatgcagga tttggaacag aggcgtccccagttggaaga 5280 actcattacc gctgcccaaa atttgaaaaa caagaccagc aatcaagaggctagaacaat 5340 cattacggat cgaattgaaa gaattcagaa tcagtgggat gaagtacaagaacaccttca 5400 gaaccggagg caacagttga atgaaatgtt aaaggattca acacaatggctggaagctaa 5460 ggaagaagct gagcaggtct taggacaggc cagagccaag cttgagtcatggaaggaggg 5520 tccctataca gtagatgcaa tccaaaagaa aatcacagaa accaagcagttggccaaaga 5580 cctccgccag tggcagacaa atgtagatgt ggcaaatgac ttggccctgaaacttctccg 5640 ggattattct gcagatgata ccagaaaagt ccacatgata acagagaatatcaatgcctc 5700 ttggagaagc attcataaaa gggtgagtga gcgagaggct gctttggaagaaactcatag 5760 attactgcaa cagttccccc tggacctgga aaagtttctt gcctggcttacagaagctga 5820 aacaactgcc aatgtcctac aggatgctac ccgtaaggaa aggctcctagaagactccaa 5880 gggagtaaaa gagctgatga aacaatggca agacctccaa ggtgaaattgaagctcacac 5940 agatgtttat cacaacctgg atgaaaacag ccaaaaaatc ctgagatccctggaaggttc 6000 cgatgatgca gtcctgttac aaagacgttt ggataacatg aacttcaagtggagtgaact 6060 tcggaaaaag tctctcaaca ttaggtccca tttggaagcc agttctgaccagtggaagcg 6120 tctgcacctt tctctgcagg aacttctggt gtggctacag ctgaaagatgatgaattaag 6180 ccggcaggca cctattggag gcgactttcc agcagttcag aagcagaacgatgtacatag 6240 ggccttcaag agggaattga aaactaaaga acctgtaatc atgagtactcttgagactgt 6300 acgaatattt ctgacagagc agcctttgga aggactagag aaactctaccaggagcccag 6360 agagctgcct cctgaggaga gagcccagaa tgtcactcgg cttctacgaaagcaggctga 6420 ggaggtcaat actgagtggg aaaaattgaa cctgcactcc gctgactggcagagaaaaat 6480 agatgagacc cttgaaagac tccaggaact tcaagaggcc acggatgagctggacctcaa 6540 gctgcgccaa gctgaggtga tcaagggatc ctggcagccc gtgggcgatctcctcattga 6600 ctctctccaa gatcacctcg agaaagtcaa ggcacttcga ggagaaattgcgcctctgaa 6660 agagaacgtg agccacgtca atgaccttgc tcgccagctt accactttgggcattcagct 6720 ctcaccgtat aacctcagca ctctggaaga cctgaacacc agatggaagcttctgcaggt 6780 ggccgtcgag gaccgagtca ggcagctgca tgaagcccac agggactttggtccagcatc 6840 tcagcacttt ctttccacgt ctgtccaggg tccctgggag agagccatctcgccaaacaa 6900 agtgccctac tatatcaacc acgagactca aacaacttgc tgggaccatcccaaaatgac 6960 agagctctac cagtctttag ctgacctgaa taatgtcaga ttctcagcttataggactgc 7020 catgaaactc cgaagactgc agaaggccct ttgcttggat ctcttgagcctgtcagctgc 7080 atgtgatgcc ttggaccagc acaacctcaa gcaaaatgac cagcccatggatatcctgca 7140 gattattaat tgtttgacca ctatttatga ccgcctggag caagagcacaacaatttggt 7200 caacgtccct ctctgcgtgg atatgtgtct gaactggctg ctgaatgtttatgatacggg 7260 acgaacaggg aggatccgtg tcctgtcttt taaaactggc atcatttccctgtgtaaagc 7320 acatttggaa gacaagtaca gatacctttt caagcaagtg gcaagttcaacaggattttg 7380 tgaccagcgc aggctgggcc tccttctgca tgattctatc caaattccaagacagttggg 7440 tgaagttgca tcctttgggg gcagtaacat tgagccaagt gtccggagctgcttccaatt 7500 tgctaataat aagccagaga tcgaagcggc cctcttccta gactggatgagactggaacc 7560 ccagtccatg gtgtggctgc ccgtcctgca cagagtggct gctgcagaaactgccaagca 7620 tcaggccaaa tgtaacatct gcaaagagtg tccaatcatt ggattcaggtacaggagtct 7680 aaagcacttt aattatgaca tctgccaaag ctgctttttt tctggtcgagttgcaaaagg 7740 ccataaaatg cactatccca tggtggaata ttgcactccg actacatcaggagaagatgt 7800 tcgagacttt gccaaggtac taaaaaacaa atttcgaacc aaaaggtattttgcgaagca 7860 tccccgaatg ggctacctgc cagtgcagac tgtcttagag ggggacaacatggaaactcc 7920 cgttactctg atcaacttct ggccagtaga ttctgcgcct gcctcgtcccctcagctttc 7980 acacgatgat actcattcac gcattgaaca ttatgctagc aggctagcagaaatggaaaa 8040 cagcaatgga tcttatctaa atgatagcat ctctcctaat gagagcatagatgatgaaca 8100 tttgttaatc cagcattact gccaaagttt gaaccaggac tcccccctgagccagcctcg 8160 tagtcctgcc cagatcttga tttccttaga gagtgaggaa agaggggagctagagagaat 8220 cctagcagat cttgaggaag aaaacaggaa tctgcaagca gaatatgaccgtctaaagca 8280 gcagcacgaa cataaaggcc tgtccccact gccgtcccct cctgaaatgatgcccacctc 8340 tccccagagt ccccgggatg ctgagctcat tgctgaggcc aagctactgcgtcaacacaa 8400 aggccgcctg gaagccagga tgcaaatcct ggaagaccac aataaacagctggagtcaca 8460 gttacacagg ctaaggcagc tgctggagca accccaggca gaggccaaagtgaatggcac 8520 aacggtgtcc tctccttcta cctctctaca gaggtccgac agcagtcagcctatgctgct 8580 ccgagtggtt ggcagtcaaa cttcggactc catgggtgag gaagatcttctcagtcctcc 8640 ccaggacaca agcacagggt tagaggaggt gatggagcaa ctcaacaactccttccctag 8700 ttcaagagga agaaataccc ctggaaagcc aatgagagag gacacaatgtaggaagtctt 8760 ttccacatgg cagatgattt gggcagagcg atggagtcct tagtatcagtcatgacagat 8820 gaagaaggag cagaataaat gttttacaac tcctgattcc cgcatggtttttataatatt 8880 catacaacaa agaggattag acagtaagag tttacaagaa ataaatctatatttttgtga 8940 agggtagtgg tattatactg tagatttcag tagtttctaa gtctgttattgttttgttaa 9000 caatggcagg ttttacacgt ctatgcaatt gtacaaaaaa gttataagaaaactacatgt 9060 aaaatcttga tagctaaata acttgccatt tctttatatg gaacgcattttgggttgttt 9120 aaaaatttat aacagttata aagaaagatt gtaaactaaa gtgtgctttataaaaaaaag 9180 ttgtttataa aaacccctaa aaacaaaaca aacacacaca cacacacatacacacacaca 9240 cacaaaactt tgaggcagcg cattgttttg catccttttg gcgtgatatccatatgaaat 9300 tcatggcttt ttcttttttt gcatattaaa gataagactt cctctaccaccacaccaaat 9360 gactactaca cactgctcat ttgagaactg tcagctgagt ggggcaggcttgagttttca 9420 tttcatatat ctatatgtct ataagtatat aaatactata gttatatagataaagagata 9480 cgaatttcta tagactgact ttttccattt tttaaatgtt catgtcacatcctaatagaa 9540 agaaattact tctagtcagt catccaggct tacctgcttg gtctagaatggatttttccc 9600 ggagccggaa gccaggagga aactacacca cactaaaaca ttgtctacagctccagatgt 9660 ttctcatttt aaacaacttt ccactgacaa cgaaagtaaa gtaaagtattggattttttt 9720 aaagggaaca tgtgaatgaa tacacaggac ttattatatc agagtgagtaatcggttggt 9780 tggttgattg attgattgat tgatacattc agcttcctgc tgctagcaatgccacgattt 9840 agatttaatg atgcttcagt ggaaatcaat cagaaggtat tctgaccttgtgaacatcag 9900 aaggtatttt ttaactccca agcagtagca ggacgatgat agggctggagggctatggat 9960 tcccagccca tccctgtgaa ggagtaggcc actctttaag tgaaggattggatgattgtt 10020 cataatacat aaagttctct gtaattacaa ctaaattatt atgccctcttctcacagtca 10080 aaaggaactg ggtggtttgg tttttgttgc ttttttagat ttattgtcccatgtgggatg 10140 agtttttaaa tgccacaaga cataatttaa aataaataaa ctttgggaaaaggtgtaaga 10200 cagtagcccc atcacatttg tgatactgac aggtatcaac ccagaagcccatgaactgtg 10260 tttccatcct ttgcatttct ctgcgagtag ttccacacag gtttgtaagtaagtaagaaa 10320 gaaggcaaat tgattcaaat gttacaaaaa aacccttctt ggtggattagacaggttaaa 10380 tatataaaca aacaaacaaa aattgctcaa aaaagaggag aaaagctcaagaggaaaagc 10440 taaggactgg taggaaaaag ctttactctt tcatgccatt ttatttctttttgattttta 10500 aatcattcat tcaatagata ccaccgtgtg acctataatt ttgcaaatctgttacctctg 10560 acatcaagtg taattagctt ttggagagtg ggctgacatc aagtgtaattagcttttgga 10620 gagtgggttt tgtccattat taataattaa ttaattaaca tcaaacacggcttctcatgc 10680 tatttctacc tcactttggt tttggggtgt tcctgataat tgtgcacacctgagttcaca 10740 gcttcaccac ttgtccattg cgttattttc tttttccttt ataattctttctttttcctt 10800 cataattttc aaaagaaaac ccaaagctct aaggtaacaa attaccaaattacatgaaga 10860 tttggttttt gtcttgcatt tttttccttt atgtgacgct ggaccttttctttacccaag 10920 gatttttaaa actcagattt aaaacaaggg gttactttac atcctactaagaagtttaag 10980 taagtaagtt tcattctaaa atcagaggta aatagagtgc ataaataattttgttttaat 11040 ctttttgttt ttcttttaga cacattagct ctggagtgag tctgtcataatatttgaaca 11100 aaaattgaga gctttattgc tgcattttaa gcataattaa tttggacattatttcgtgtt 11160 gtgttcttta taaccaccga gtattaaact gtaaatcata atgtaactgaagcataaaca 11220 tcacatggca tgttttgtca ttgttttcag gtactgagtt cttacttgagtatcataata 11280 tattgtgttt taacaccaac actgtaacat ttacgaatta tttttttaaacttcagtttt 11340 actgcatttt cacaacatat cagacttcac caaatatatg ccttactattgtattatagt 11400 actgctttac tgtgtatctc aataaagcac gcagttatgt tac 1144345 114 DNA Artificial Sequence Synthetic 45 acgtctgtcc agggtccctgggagagagcc atctcgccaa acaaagtgcc ctactatatc 60 aaccacgaga ctcaaacaacttgctgggac catcccaaaa tgacagagct ctac 114 46 2989 DNA ArtificialSequence Synthetic 46 ctaaattgta agcgttaata ttttgttaaa attcgcgttaaatttttgtt aaatcagctc 60 attttttaac caataggccg aaatcggcaa aatcccttataaatcaaaag aatagaccga 120 gatagggttg agtgttgttc cagtttggaa caagagtccactattaaaga acgtggactc 180 caacgtcaaa gggcgaaaaa ccgtctatca gggcgatggcccactacgtg aaccatcacc 240 ctaatcaagt tttttggggt cgaggtgccg taaagcactaaatcggaacc ctaaagggag 300 cccccgattt agagcttgac ggggaaagcc ggcgaacgtggcgagaaagg aagggaagaa 360 agcgaaagga gcgggcgcta gggcgctggc aagtgtagcggtcacgctgc gcgtaaccac 420 cacacccgcc gcgcttaatg cgccgctaca gggcgcgtcccattcgccat tcaggctgcg 480 caactgttgg gaagggcgat cggtgcgggc ctcttcgctattacgccagc tggcgaaagg 540 gggatgtgct gcaaggcgat taagttgggt aacgccagggttttcccagt cacgacgttg 600 taaaacgacg gccagtgagc gcgcgtaata cgactcactatagggcgaat tggagcttac 660 gtattaatta aggcgccgcg gtggcggccg ctctagaactagtggatccc ccgggctgca 720 ggaattcggc cgcctaggcc acgcgtaagc ttatcgataccgtcgacctc gagggggggc 780 ccggtaccca gcttttgttc cctttagtga gggttaattgcgcgcttggc gtaatcatgg 840 tcatagctgt ttcctgtgtg aaattgttat ccgctcacaattccacacaa catacgagcc 900 ggaagcataa agtgtaaagc ctggggtgcc taatgagtgagctaactcac attaattgcg 960 ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgtgccagctgca ttaatgaatc 1020 ggccaacgcg cggggagagg cggtttgcgt attgggcgctcttccgcttc ctcgctcact 1080 gactcgctgc gctcggtcgt tcggctgcgg cgagcggtatcagctcactc aaaggcggta 1140 atacggttat ccacagaatc aggggataac gcaggaaagaacatgtgagc aaaaggccag 1200 caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgtttttccatag gctccgcccc 1260 cctgacgagc atcacaaaaa tcgacgctca agtcagaggtggcgaaaccc gacaggacta 1320 taaagatacc aggcgtttcc ccctggaagc tccctcgtgcgctctcctgt tccgaccctg 1380 ccgcttaccg gatacctgtc cgcctttctc ccttcgggaagcgtggcgct ttctcatagc 1440 tcacgctgta ggtatctcag ttcggtgtag gtcgttcgctccaagctggg ctgtgtgcac 1500 gaaccccccg ttcagcccga ccgctgcgcc ttatccggtaactatcgtct tgagtccaac 1560 ccggtaagac acgacttatc gccactggca gcagccactggtaacaggat tagcagagcg 1620 aggtatgtag gcggtgctac agagttcttg aagtggtggcctaactacgg ctacactaga 1680 aggacagtat ttggtatctg cgctctgctg aagccagttaccttcggaaa aagagttggt 1740 agctcttgat ccggcaaaca aaccaccgct ggtagcggtggtttttttgt ttgcaagcag 1800 cagattacgc gcagaaaaaa aggatctcaa gaagatcctttgatcttttc tacggggtct 1860 gacgctcagt ggaacgaaaa ctcacgttaa gggattttggtcatgagatt atcaaaaagg 1920 atcttcacct agatcctttt aaattaaaaa tgaagttttaaatcaatcta aagtatatat 1980 gagtaaactt ggtctgacag ttaccaatgc ttaatcagtgaggcacctat ctcagcgatc 2040 tgtctatttc gttcatccat agttgcctga ctccccgtcgtgtagataac tacgatacgg 2100 gagggcttac catctggccc cagtgctgca atgataccgcgagacccacg ctcaccggct 2160 ccagatttat cagcaataaa ccagccagcc ggaagggccgagcgcagaag tggtcctgca 2220 actttatccg cctccatcca gtctattaat tgttgccgggaagctagagt aagtagttcg 2280 ccagttaata gtttgcgcaa cgttgttgcc attgctacaggcatcgtggt gtcacgctcg 2340 tcgtttggta tggcttcatt cagctccggt tcccaacgatcaaggcgagt tacatgatcc 2400 cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctccgatcgttgt cagaagtaag 2460 ttggccgcag tgttatcact catggttatg gcagcactgcataattctct tactgtcatg 2520 ccatccgtaa gatgcttttc tgtgactggt gagtactcaaccaagtcatt ctgagaatag 2580 tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatacgggataatac cgcgccacat 2640 agcagaactt taaaagtgct catcattgga aaacgttcttcggggcgaaa actctcaagg 2700 atcttaccgc tgttgagatc cagttcgatg taacccactcgtgcacccaa ctgatcttca 2760 gcatctttta ctttcaccag cgtttctggg tgagcaaaaacaggaaggca aaatgccgca 2820 aaaaagggaa taagggcgac acggaaatgt tgaatactcatactcttcct ttttcaatat 2880 tattgaagca tttatcaggg ttattgtctc atgagcggatacatatttga atgtatttag 2940 aaaaataaac aaataggggt tccgcgcaca tttccccgaaaagtgccac 2989 47 12057 DNA Artificial Sequence Synthetic 47 gggattccctcactttcccc ctacaggact cagatctggg aggcaattac cttcggagaa 60 aaacgaataggaaaaactga agtgttactt tttttaaagc tgctgaagtt tgttggtttc 120 tcattgtttttaagcctact ggagcaataa agtttgaaga acttttacca ggtttttttt 180 atcgctgccttgatatacac ttttcaaaat gctttggtgg gaagaagtag aggactgtta 240 tgaaagagaagatgttcaaa agaaaacatt cacaaaatgg gtaaatgcac aattttctaa 300 gtttgggaagcagcatattg agaacctctt cagtgaccta caggatggga ggcgcctcct 360 agacctcctcgaaggcctga cagggcaaaa actgccaaaa gaaaaaggat ccacaagagt 420 tcatgccctgaacaatgtca acaaggcact gcgggttttg cagaacaata atgttgattt 480 agtgaatattggaagtactg acatcgtaga tggaaatcat aaactgactc ttggtttgat 540 ttggaatataatcctccact ggcaggtcaa aaatgtaatg aaaaatatca tggctggatt 600 gcaacaaaccaacagtgaaa agattctcct gagctgggtc cgacaatcaa ctcgtaatta 660 tccacaggttaatgtaatca acttcaccac cagctggtct gatggcctgg ctttgaatgc 720 tctcatccatagtcataggc cagacctatt tgactggaat agtgtggttt gccagcagtc 780 agccacacaacgactggaac atgcattcaa catcgccaga tatcaattag gcatagagaa 840 actactcgatcctgaagatg ttgataccac ctatccagat aagaagtcca tcttaatgta 900 catcacatcactcttccaag ttttgcctca acaagtgagc attgaagcca tccaggaagt 960 ggaaatgttgccaaggccac ctaaagtgac taaagaagaa cattttcagt tacatcatca 1020 aatgcactattctcaacaga tcacggtcag tctagcacag ggatatgaga gaacttcttc 1080 ccctaagcctcgattcaaga gctatgccta cacacaggct gcttatgtca ccacctctga 1140 ccctacacggagcccatttc cttcacagca tttggaagct cctgaagaca agtcatttgg 1200 cagttcattgatggagagtg aagtaaacct ggaccgttat caaacagctt tagaagaagt 1260 attatcgtggcttctttctg ctgaggacac attgcaagca caaggagaga tttctaatga 1320 tgtggaagtggtgaaagacc agtttcatac tcatgagggg tacatgatgg atttgacagc 1380 ccatcagggccgggttggta atattctaca attgggaagt aagctgattg gaacaggaaa 1440 attatcagaagatgaagaaa ctgaagtaca agagcagatg aatctcctaa attcaagatg 1500 ggaatgcctcagggtagcta gcatggaaaa acaaagcaat ttacatagag ttttaatgga 1560 tctccagaatcagaaactga aagagttgaa tgactggcta acaaaaacag aagaaagaac 1620 aaggaaaatggaggaagagc ctcttggacc tgatcttgaa gacctaaaac gccaagtaca 1680 acaacataaggtgcttcaag aagatctaga acaagaacaa gtcagggtca attctctcac 1740 tcacatggtggtggtagttg atgaatctag tggagatcac gcaactgctg ctttggaaga 1800 acaacttaaggtattgggag atcgatgggc aaacatctgt agatggacag aagaccgctg 1860 ggttcttttacaagacatcc ttctcaaatg gcaacgtctt actgaagaac agtgcctttt 1920 tagtgcatggctttcagaaa aagaagatgc agtgaacaag attcacacaa ctggctttaa 1980 agatcaaaatgaaatgttat caagtcttca aaaactggcc gttttaaaag cggatctaga 2040 aaagaaaaagcaatccatgg gcaaactgta ttcactcaaa caagatcttc tttcaacact 2100 gaagaataagtcagtgaccc agaagacgga agcatggctg gataactttg cccggtgttg 2160 ggataatttagtccaaaaac ttgaaaagag tacagcacag atttcacagg ctgtcaccac 2220 cactcagccatcactaacac agacaactgt aatggaaaca gtaactacgg tgaccacaag 2280 ggaacagatcctggtaaagc atgctcaaga ggaacttcca ccaccacctc cccaaaagaa 2340 gaggcagattactgtggatt ctgaaattag gaaaaggttg gatgttgata taactgaact 2400 tcacagctggattactcgct cagaagctgt gttgcagagt cctgaatttg caatctttcg 2460 gaaggaaggcaacttctcag acttaaaaga aaaagtcaat gccatagagc gagaaaaagc 2520 tgagaagttcagaaaactgc aagatgccag cagatcagct caggccctgg tggaacagat 2580 ggtgaatgagggtgttaatg cagatagcat caaacaagcc tcagaacaac tgaacagccg 2640 gtggatcgaattctgccagt tgctaagtga gagacttaac tggctggagt atcagaacaa 2700 catcatcgctttctataatc agctacaaca attggagcag atgacaacta ctgctgaaaa 2760 ctggttgaaaatccaaccca ccaccccatc agagccaaca gcaattaaaa gtcagttaaa 2820 aatttgtaaggatgaagtca accggctatc aggtcttcaa cctcaaattg aacgattaaa 2880 aattcaaagcatagccctga aagagaaagg acaaggaccc atgttcctgg atgcagactt 2940 tgtggcctttacaaatcatt ttaagcaagt cttttctgat gtgcaggcca gagagaaaga 3000 gctacagacaatttttgaca ctttgccacc aatgcgctat caggagacca tgagtgccat 3060 caggacatgggtccagcagt cagaaaccaa actctccata cctcaactta gtgtcaccga 3120 ctatgaaatcatggagcaga gactcgggga attgcaggct ttacaaagtt ctctgcaaga 3180 gcaacaaagtggcctatact atctcagcac cactgtgaaa gagatgtcga agaaagcgcc 3240 ctctgaaattagccggaaat atcaatcaga atttgaagaa attgagggac gctggaagaa 3300 gctctcctcccagctggttg agcattgtca aaagctagag gagcaaatga ataaactccg 3360 aaaaattcagaatcacatac aaaccctgaa gaaatggatg gctgaagttg atgtttttct 3420 gaaggaggaatggcctgccc ttggggattc agaaattcta aaaaagcagc tgaaacagtg 3480 cagacttttagtcagtgata ttcagacaat tcagcccagt ctaaacagtg tcaatgaagg 3540 tgggcagaagataaagaatg aagcagagcc agagtttgct tcgagacttg agacagaact 3600 caaagaacttaacactcagt gggatcacat gtgccaacag gtctatgcca gaaaggaggc 3660 cttgaagggaggtttggaga aaactgtaag cctccagaaa gatctatcag agatgcacga 3720 atggatgacacaagctgaag aagagtatct tgagagagat tttgaatata aaactccaga 3780 tgaattacagaaagcagttg aagagatgaa gagagctaaa gaagaggccc aacaaaaaga 3840 agcgaaagtgaaactcctta ctgagtctgt aaatagtgtc atagctcaag ctccacctgt 3900 agcacaagaggccttaaaaa aggaacttga aactctaacc accaactacc agtggctctg 3960 cactaggctgaatgggaaat gcaagacttt ggaagaagtt tgggcatgtt ggcatgagtt 4020 attgtcatacttggagaaag caaacaagtg gctaaatgaa gtagaattta aacttaaaac 4080 cactgaaaacattcctggcg gagctgagga aatctctgag gtgctagatt cacttgaaaa 4140 tttgatgcgacattcagagg ataacccaaa tcagattcgc atattggcac agaccctaac 4200 agatggcggagtcatggatg agctaatcaa tgaggaactt gagacattta attctcgttg 4260 gagggaactacatgaagagg ctgtaaggag gcaaaagttg cttgaacaga gcatccagtc 4320 tgcccaggagactgaaaaat ccttacactt aatccaggag tccctcacat tcattgacaa 4380 gcagttggcagcttatattg cagacaaggt ggacgcagct caaatgcctc aggaagccca 4440 gaaaatccaatctgatttga caagtcatga gatcagttta gaagaaatga agaaacataa 4500 tcaggggaaggaggctgccc aaagagtcct gtctcagatt gatgttgcac agaaaaaatt 4560 acaagatgtctccatgaagt ttcgattatt ccagaaacca gccaattttg agctgcgtct 4620 acaagaaagtaagatgattt tagatgaagt gaagatgcac ttgcctgcat tggaaacaaa 4680 gagtgtggaacaggaagtag tacagtcaca gctaaatcat tgtgtgaact tgtataaaag 4740 tctgagtgaagtgaagtctg aagtggaaat ggtgataaag actggacgtc agattgtaca 4800 gaaaaagcagacggaaaatc ccaaagaact tgatgaaaga gtaacagctt tgaaattgca 4860 ttataatgagctgggagcaa aggtaacaga aagaaagcaa cagttggaga aatgcttgaa 4920 attgtcccgtaagatgcgaa aggaaatgaa tgtcttgaca gaatggctgg cagctacaga 4980 tatggaattgacaaagagat cagcagttga aggaatgcct agtaatttgg attctgaagt 5040 tgcctggggaaaggctactc aaaaagagat tgagaaacag aaggtgcacc tgaagagtat 5100 cacagaggtaggagaggcct tgaaaacagt tttgggcaag aaggagacgt tggtggaaga 5160 taaactcagtcttctgaata gtaactggat agctgtcacc tcccgagcag aagagtggtt 5220 aaatcttttgttggaatacc agaaacacat ggaaactttt gaccagaatg tggaccacat 5280 cacaaagtggatcattcagg ctgacacact tttggatgaa tcagagaaaa agaaacccca 5340 gcaaaaagaagacgtgctta agcgtttaaa ggcagaactg aatgacatac gcccaaaggt 5400 ggactctacacgtgaccaag cagcaaactt gatggcaaac cgcggtgacc actgcaggaa 5460 attagtagagccccaaatct cagagctcaa ccatcgattt gcagccattt cacacagaat 5520 taagactggaaaggcctcca ttcctttgaa ggaattggag cagtttaact cagatataca 5580 aaaattgcttgaaccactgg aggctgaaat tcagcagggg gtgaatctga aagaggaaga 5640 cttcaataaagatatgaatg aagacaatga gggtactgta aaagaattgt tgcaaagagg 5700 agacaacttacaacaaagaa tcacagatga gagaaagaga gaggaaataa agataaaaca 5760 gcagctgttacagacaaaac ataatgctct caaggatttg aggtctcaaa gaagaaaaaa 5820 ggctctagaaatttctcatc agtggtatca gtacaagagg caggctgatg atctcctgaa 5880 atgcttggatgacattgaaa aaaaattagc cagcctacct gagcccagag atgaaaggaa 5940 aataaaggaaattgatcggg aattgcagaa gaagaaagag gagctgaatg cagtgcgtag 6000 gcaagctgagggcttgtctg aggatggggc cgcaatggca gtggagccaa ctcagatcca 6060 gctcagcaagcgctggcggg aaattgagag caaatttgct cagtttcgaa gactcaactt 6120 tgcacaaattcacactgtcc gtgaagaaac gatgatggtg atgactgaag acatgccttt 6180 ggaaatttcttatgtgcctt ctacttattt gactgaaatc actcatgtct cacaagccct 6240 attagaagtggaacaacttc tcaatgctcc tgacctctgt gctaaggact ttgaagatct 6300 ctttaagcaagaggagtctc tgaagaatat aaaagatagt ctacaacaaa gctcaggtcg 6360 gattgacattattcatagca agaagacagc agcattgcaa agtgcaacgc ctgtggaaag 6420 ggtgaagctacaggaagctc tctcccagct tgatttccaa tgggaaaaag ttaacaaaat 6480 gtacaaggaccgacaagggc gatttgacag atctgttgag aaatggcggc gttttcatta 6540 tgatataaagatatttaatc agtggctaac agaagctgaa cagtttctca gaaagacaca 6600 aattcctgagaattgggaac atgctaaata caaatggtat cttaaggaac tccaggatgg 6660 cattgggcagcggcaaactg ttgtcagaac attgaatgca actggggaag aaataattca 6720 gcaatcctcaaaaacagatg ccagtattct acaggaaaaa ttgggaagcc tgaatctgcg 6780 gtggcaggaggtctgcaaac agctgtcaga cagaaaaaag aggctagaag aacaaaagaa 6840 tatcttgtcagaatttcaaa gagatttaaa tgaatttgtt ttatggttgg aggaagcaga 6900 taacattgctagtatcccac ttgaacctgg aaaagagcag caactaaaag aaaagcttga 6960 gcaagtcaagttactggtgg aagagttgcc cctgcgccag ggaattctca aacaattaaa 7020 tgaaactggaggacccgtgc ttgtaagtgc tcccataagc ccagaagagc aagataaact 7080 tgaaaataagctcaagcaga caaatctcca gtggataaag gtttccagag ctttacctga 7140 gaaacaaggagaaattgaag ctcaaataaa agaccttggg cagcttgaaa aaaagcttga 7200 agaccttgaagagcagttaa atcatctgct gctgtggtta tctcctatta ggaatcagtt 7260 ggaaatttataaccaaccaa accaagaagg accatttgac gttcaggaaa ctgaaatagc 7320 agttcaagctaaacaaccgg atgtggaaga gattttgtct aaagggcagc atttgtacaa 7380 ggaaaaaccagccactcagc cagtgaagag gaagttagaa gatctgagct ctgagtggaa 7440 ggcggtaaaccgtttacttc aagagctgag ggcaaagcag cctgacctag ctcctggact 7500 gaccactattggagcctctc ctactcagac tgttactctg gtgacacaac ctgtggttac 7560 taaggaaactgccatctcca aactagaaat gccatcttcc ttgatgttgg aggtacctgc 7620 tctggcagatttcaaccggg cttggacaga acttaccgac tggctttctc tgcttgatca 7680 agttataaaatcacagaggg tgatggtggg tgaccttgag gatatcaacg agatgatcat 7740 caagcagaaggcaacaatgc aggatttgga acagaggcgt ccccagttgg aagaactcat 7800 taccgctgcccaaaatttga aaaacaagac cagcaatcaa gaggctagaa caatcattac 7860 ggatcgaattgaaagaattc agaatcagtg ggatgaagta caagaacacc ttcagaaccg 7920 gaggcaacagttgaatgaaa tgttaaagga ttcaacacaa tggctggaag ctaaggaaga 7980 agctgagcaggtcttaggac aggccagagc caagcttgag tcatggaagg agggtcccta 8040 tacagtagatgcaatccaaa agaaaatcac agaaaccaag cagttggcca aagacctccg 8100 ccagtggcagacaaatgtag atgtggcaaa tgacttggcc ctgaaacttc tccgggatta 8160 ttctgcagatgataccagaa aagtccacat gataacagag aatatcaatg cctcttggag 8220 aagcattcataaaagggtga gtgagcgaga ggctgctttg gaagaaactc atagattact 8280 gcaacagttccccctggacc tggaaaagtt tcttgcctgg cttacagaag ctgaaacaac 8340 tgccaatgtcctacaggatg ctacccgtaa ggaaaggctc ctagaagact ccaagggagt 8400 aaaagagctgatgaaacaat ggcaagacct ccaaggtgaa attgaagctc acacagatgt 8460 ttatcacaacctggatgaaa acagccaaaa aatcctgaga tccctggaag gttccgatga 8520 tgcagtcctgttacaaagac gtttggataa catgaacttc aagtggagtg aacttcggaa 8580 aaagtctctcaacattaggt cccatttgga agccagttct gaccagtgga agcgtctgca 8640 cctttctctgcaggaacttc tggtgtggct acagctgaaa gatgatgaat taagccggca 8700 ggcacctattggaggcgact ttccagcagt tcagaagcag aacgatgtac atagggcctt 8760 caagagggaattgaaaacta aagaacctgt aatcatgagt actcttgaga ctgtacgaat 8820 atttctgacagagcagcctt tggaaggact agagaaactc taccaggagc ccagagagct 8880 gcctcctgaggagagagccc agaatgtcac tcggcttcta cgaaagcagg ctgaggaggt 8940 caatactgagtgggaaaaat tgaacctgca ctccgctgac tggcagagaa aaatagatga 9000 gacccttgaaagactccagg aacttcaaga ggccacggat gagctggacc tcaagctgcg 9060 ccaagctgaggtgatcaagg gatcctggca gcccgtgggc gatctcctca ttgactctct 9120 ccaagatcacctcgagaaag tcaaggcact tcgaggagaa attgcgcctc tgaaagagaa 9180 cgtgagccacgtcaatgacc ttgctcgcca gcttaccact ttgggcattc agctctcacc 9240 gtataacctcagcactctgg aagacctgaa caccagatgg aagcttctgc aggtggccgt 9300 cgaggaccgagtcaggcagc tgcatgaagc ccacagggac tttggtccag catctcagca 9360 ctttctttccacgtctgtcc agggtccctg ggagagagcc atctcgccaa acaaagtgcc 9420 ctactatatcaaccacgaga ctcaaacaac ttgctgggac catcccaaaa tgacagagct 9480 ctaccagtctttagctgacc tgaataatgt cagattctca gcttatagga ctgccatgaa 9540 actccgaagactgcagaagg ccctttgctt ggatctcttg agcctgtcag ctgcatgtga 9600 tgccttggaccagcacaacc tcaagcaaaa tgaccagccc atggatatcc tgcagattat 9660 taattgtttgaccactattt atgaccgcct ggagcaagag cacaacaatt tggtcaacgt 9720 ccctctctgcgtggatatgt gtctgaactg gctgctgaat gtttatgata cgggacgaac 9780 agggaggatccgtgtcctgt cttttaaaac tggcatcatt tccctgtgta aagcacattt 9840 ggaagacaagtacagatacc ttttcaagca agtggcaagt tcaacaggat tttgtgacca 9900 gcgcaggctgggcctccttc tgcatgattc tatccaaatt ccaagacagt tgggtgaagt 9960 tgcatcctttgggggcagta acattgagcc aagtgtccgg agctgcttcc aatttgctaa 10020 taataagccagagatcgaag cggccctctt cctagactgg atgagactgg aaccccagtc 10080 catggtgtggctgcccgtcc tgcacagagt ggctgctgca gaaactgcca agcatcaggc 10140 caaatgtaacatctgcaaag agtgtccaat cattggattc aggtacagga gtctaaagca 10200 ctttaattatgacatctgcc aaagctgctt tttttctggt cgagttgcaa aaggccataa 10260 aatgcactatcccatggtgg aatattgcac tccgactaca tcaggagaag atgttcgaga 10320 ctttgccaaggtactaaaaa acaaatttcg aaccaaaagg tattttgcga agcatccccg 10380 aatgggctacctgccagtgc agactgtctt agagggggac aacatggaaa cgcctgcctc 10440 gtcccctcagctttcacacg atgatactca ttcacgcatt gaacattatg ctagcaggct 10500 agcagaaatggaaaacagca atggatctta tctaaatgat agcatctctc ctaatgagag 10560 catagatgatgaacatttgt taatccagca ttactgccaa agtttgaacc aggactcccc 10620 cctgagccagcctcgtagtc ctgcccagat cttgatttcc ttagagagtg aggaaagagg 10680 ggagctagagagaatcctag cagatcttga ggaagaaaac aggaatctgc aagcagaata 10740 tgaccgtctaaagcagcagc acgaacataa aggcctgtcc ccactgccgt cccctcctga 10800 aatgatgcccacctctcccc agagtccccg ggatgctgag ctcattgctg aggccaagct 10860 actgcgtcaacacaaaggcc gcctggaagc caggatgcaa atcctggaag accacaataa 10920 acagctggagtcacagttac acaggctaag gcagctgctg gagcaacccc aggcagaggc 10980 caaagtgaatggcacaacgg tgtcctctcc ttctacctct ctacagaggt ccgacagcag 11040 tcagcctatgctgctccgag tggttggcag tcaaacttcg gactccatgg gtgaggaaga 11100 tcttctcagtcctccccagg acacaagcac agggttagag gaggtgatgg agcaactcaa 11160 caactccttccctagttcaa gaggaagaaa tacccctgga aagccaatga gagaggacac 11220 aatgtaggaagtcttttcca catggcagat gatttgggca gagcgatgga gtccttagta 11280 tcagtcatgacagatgaaga aggagcagaa taaatgtttt acaactcctg attcccgcat 11340 ggtttttataatattcatac aacaaagagg attagacagt aagagtttac aagaaataaa 11400 tctatatttttgtgaagggt agtggtatta tactgtagat ttcagtagtt tctaagtctg 11460 ttattgttttgttaacaatg gcaggtttta cacgtctatg caattgtaca aaaaagttat 11520 aagaaaactacatgtaaaat cttgatagct aaataacttg ccatttcttt atatggaacg 11580 cattttgggttgtttaaaaa tttataacag ttataaagaa agattgtaaa ctaaagtgtg 11640 ctttataaaaaaaagttgtt tataaaaacc cctaaaaaca aaacaaacac acacacacac 11700 acatacacacacacacacaa aactttgagg cagcgcattg ttttgcatcc ttttggcgtg 11760 atatccatatgaaattcatg gctttttctt tttttgcata ttaaagataa gacttcctct 11820 accaccacaccaaatgacta ctacacactg ctcatttgag aactgtcagc tgagtggggc 11880 aggcttgagttttcatttca tatatctata tgtctataag tatataaata ctatagttat 11940 atagataaagagatacgaat ttctatagac tgactttttc cattttttaa atgttcatgt 12000 cacatcctaatagaaagaaa ttacttctag tcagtcatcc aggcttacct gcttggt 12057 48 22 DNAArtificial Sequence Synthetic 48 gaacaagatt cacacaactg gc 22 49 38 DNAArtificial Sequence Synthetic 49 gttcctggag tctttcaaga tccacagtaatctgcctc 38 50 38 DNA Artificial Sequence Synthetic 50 gaggcagattactgtggatc ttgaaagact ccaggaac 38 51 18 DNA Artificial SequenceSynthetic 51 tgtttggcga gatggctc 18 52 21 DNA Artificial SequenceSynthetic 52 gatgtggaag tggtgaaaga c 21 53 38 DNA Artificial SequenceSynthetic 53 ccaatagtgg tcagtccagg agcatgtaaa ttgctttg 38 54 38 DNAArtificial Sequence Synthetic 54 caaagcaatt tacatgctcc tggactgaccactattgg 38 55 38 DNA Artificial Sequence Synthetic 55 ctgttgcagtaatctatgct ccaacatcaa ggaagatg 38 56 38 DNA Artificial SequenceSynthetic 56 catcttcctt gatgttggag catagattac tgcaacag 38 57 33 DNAArtificial Sequence Synthetic 57 ctgttgcagt aatctatgat gtaaattgct ttg 3358 33 DNA Artificial Sequence Synthetic 58 caaagcaatt tacatcatagattactgcaa cag 33 59 70 DNA Artificial Sequence Synthetic 59 tagcggccgcggtttttttt atcgctgcct tgatatacac tttccaccat gctttggtgg 60 gaagaagtag 7060 19 DNA Artificial Sequence Synthetic 60 ttttcctgtt ccaatcagc 19 61591 DNA Artificial Sequence Synthetic 61 actacgggtc taggctgcccatgtaaggag gcaaggcctg gggacacccg agatgcctgg 60 ttataattaa ccccaacacctgctgccccc ccccccccaa cacctgctgc ctgagcctga 120 gcggttaccc caccccggtgcctgggtctt aggctctgta caccatggag gagaagctcg 180 ctctaaaaat aaccctgtccctggtgggcc caatcaaggc tgtgggggac tgagggcagg 240 ctgtaacagg cttgggggccagggcttata cgtgcctggg actcccaaag tattactgtt 300 ccatgttccc ggcgaagggccagctgtccc ccgccagcta gactcagcac ttagtttagg 360 aaccagtgag caagtcagcccttggggcag cccatacaag gccatggggc tgggcaagct 420 gcacgcctgg gtccggggtgggcacggtgc ccgggcaacg agctgaaagc tcatctgctc 480 tcagggcccc tccctggggacagcccctcc tggctagtca caccctgtag gctcctctat 540 ataacccagg ggcacaggggctgcccccgg gtcacgggga tcctctagac c 591 62 26 DNA Artificial SequenceSynthetic 62 agcggccgcg gtactacggg tctagg 26 63 28 DNA ArtificialSequence Synthetic 63 atcggccgtc tagaggatcc ccgtgacc 28 64 19 DNAArtificial Sequence Synthetic 64 tctctccaag atcacctcg 19 65 36 DNAArtificial Sequence Synthetic 65 atgaagcttg cggccgcatg cgggaatcag gagttg36 66 36 DNA Artificial Sequence Synthetic 66 ggcttcctac attgtgtcagtttccatgtt gtcccc 36 67 18 DNA Artificial Sequence Synthetic 67tctctccaag atcacctc 18 68 36 DNA Artificial Sequence Synthetic 68ggggacaaca tggaaactga cacaatgtag gaagcc 36 69 29 DNA Artificial SequenceSynthetic 69 agcggccgca aaaaacctcc cacacctcc 29 70 28 DNA ArtificialSequence Synthetic 70 tacggccgat ccagacatga taagatac 28 71 202 DNAArtificial Sequence Synthetic 71 gatccagaca tgataagata cattgatgagtttggacaaa ccacaactag aatgcagtga 60 aaaaaatgct ttatttgtga aatttgtgatgctattgctt tatttgtaac cattataagc 120 tgcaataaac aagttaacaa caacaattgcattcatttta tgtttcaggt tcagggggag 180 gtgtgggagg ttttttcgga tc 202 72 24DNA Artificial Sequence Synthetic 72 tgtgctgcaa ggcgattaag ttgg 24 73 24DNA Artificial Sequence Synthetic 73 ccaggcttta cactttatgc ttcc 24 74 31DNA Artificial Sequence Synthetic 74 gcacagattt cacagcagcc tgacctagct c31 75 31 DNA Artificial Sequence Synthetic 75 gagctaggtc aggctgctgtgaaatctgtg c 31 76 30 DNA Artificial Sequence Synthetic 76 caagactttggaagatctgt tgagaaatgg 30 77 30 DNA Artificial Sequence Synthetic 77ccatttctca acagatcttc caaagtcttg 30 78 32 DNA Artificial SequenceSynthetic 78 ggaagctcct gaagacgccc acagggactt tg 32 79 32 DNA ArtificialSequence Synthetic 79 caaagtccct gtgggcgtct tcaggagctt cc 32 80 20 DNAArtificial Sequence Synthetic 80 agtgtggttt gccagcagtc 20 81 20 DNAArtificial Sequence Synthetic 81 tggttgatat agtagggcac 20 82 30 DNAArtificial Sequence Synthetic 82 cagatttcac aggctgctct ggcagatttc 30 8312 DNA Artificial Sequence Synthetic 83 aattcgtcga cg 12 84 30 DNAArtificial Sequence Synthetic 84 gaaatctgcc agagcagcct gtgaaatctg 30 8530 DNA Artificial Sequence Synthetic 85 tgaatccttt aacataggta cctccaacat30 86 30 DNA Artificial Sequence Synthetic 86 atgttggagg tacctatgttaaaggattca 30 87 206 DNA Mus musculus 87 ccactacggg tctaggctgcccatgtaagg aggcaaggcc tggggacacc cgagatgcct 60 ggttataatt aacccagacatgtggctgcc cccccccccc caacacctgc tgcctgagcc 120 tcacccccac cccggtgcctgggtcttagg ctctgtacac catggaggag aagctcgctc 180 taaaaataac cctgtccctggtggat 206 88 205 DNA Artificial Sequence Synthetic 88 ccactacgggtctaggctgc ccatgtaagg aggcaaggcc tggggacacc cgagatgcct 60 ggttataattaaccccaaca cctgctgccc cccccccccc aacacctgct gcctgagcct 120 cacccccaccccggtgcctg ggtcttaggc tctgtacacc atggaggaga agctcgctct 180 aaaaataaccctgtccctgg tggat 205 89 212 DNA Artificial Sequence Synthetic 89ccactacggg tctaggctgc ccatgtaagg aggcaaggcc tggggacacc cgagatgcct 60ggttataatt aacccagaca tgtggctgcc cccccccccc caacacctgc tgcctgagcc 120tgagcggtta ccccaccccg gtgcctgggt cttaggctct gtacaccatg gaggagaagc 180tcgctctaaa aataaccctg tccctggtgg at 212 90 211 DNA Artificial SequenceSynthetic 90 ccactacggg tctaggctgc ccatgtaagg aggcaaggcc tggggacacccgagatgcct 60 ggttataatt aaccccaaca cctgctgccc cccccccccc aacacctgctgcctgagcct 120 gagcggttac cccaccccgg tgcctgggtc ttaggctctg tacaccatggaggagaagct 180 cgctctaaaa ataaccctgt ccctggtgga t 211 91 170 DNAArtificial Sequence Synthetic 91 ccactacggg tctaggctgc ccatgtaaggaggcaaggcc tggggacacc cgagatgcct 60 ggttataatt aaccccaaca cctgctgccccccccccccc aacacctgct gcctgagcct 120 gagcggttac cccaccccgg tgcctgggtcttaggctctg tacaccatgg 170 92 951 DNA Mus musculus 92 gtggagcagcctgcactggg cttctgggag aaaccaaacc gggttctaac ctttcagcta 60 cagttattgcctttcctgta gatgggcgac tacagcccca cccccacccc cgtctcctgt 120 atccttcctgggcctgggga tcctaggctt tcactggaaa tttcccccca ggtgctgtag 180 gctagagtcacggctcccaa gaacagtgct tgcctggcat gcatggttct gaacctccaa 240 ctgcaaaaaatgacacatac cttgaccctt ggaaggctga ggcaggggga ttgccatgag 300 tgcaaagccagactgggtgg catagttaga ccctgtctca aaaaaccaaa aacaattaaa 360 taactaaagtcaggcaagta atcctactcg ggagactgag gcagagggat tgttacatgt 420 ctgaggccagcctggactac atagggtttc aggctagccc tgtctacaga gtaaggccct 480 atttcaaaaacacaaacaaa atggttctcc cagctgctaa tgctcaccag gcaatgaagc 540 ctggtgagcattagcaatga aggcaatgaa ggagggtgct ggctacaatc aaggctgtgg 600 gggactgagggcaggctgta acaggcttgg gggccagggc ttatacgtgc ctgggactcc 660 caaagtattactgttccatg ttcccggcga agggccagct gtcccccgcc agctagactc 720 agcacttagtttaggaacca gtgagcaagt cagcccttgg ggcagcccat acaaggccat 780 ggggctgggcaagctgcacg cctgggtccg gggtgggcac ggtgcccggg caacgagctg 840 aaagctcatctgctctcagg ggcccctccc tggggacagc ccctcctggc tagtcacacc 900 ctgtaggctcctctatataa cccaggggca caggggctgc ccccgggtca c 951 93 365 DNA Musmusculus 93 aatcaaggct gtgggggact gagggcaggc tgtaacaggc ttgggggccagggcttatac 60 gtgcctggga ctcccaaagt attactgttc catgttcccg gcgaagggccagctgtcccc 120 cgccagctag actcagcact tagtttagga accagtgagc aagtcagcccttggggcagc 180 ccatacaagg ccatggggct gggcaagctg cacgcctggg tccggggtgggcacggtgcc 240 cgggcaacga gctgaaagct catctgctct caggggcccc tccctggggacagcccctcc 300 tggctagtca caccctgtag gctcctctat ataacccagg ggcacaggggctgcccccgg 360 gtcac 365 94 87 DNA Mus musculus 94 cctccctggg gacagcccctcctggctagt cacaccctgt aggctcctct atataaccca 60 ggggcacagg ggctgcccccgggtcac 87 95 130 DNA Artificial Sequence Synthetic 95 gtaaaacgacggccagtgaa ttcgagctcg gtacccgggg atcctctaga gtcgacctgc 60 aggcatgcaagctttcccta tagtgagtcg tattagagct tggcgtaatc atggtcatag 120 ctgtttcctg130 96 38 PRT Artificial Sequence Synthetic 96 Val Val Ala Leu Ser AsnSer Ser Pro Val Arg Pro Asp Glu Leu Thr 1 5 10 15 Ser Arg Cys Ala HisLeu Ser Glu Arg Tyr His Thr Thr Asn Ser Ser 20 25 30 Pro Thr Ile Met ThrMet 35

We claim:
 1. A composition comprising nucleic acid encoding amini-dystrophin peptide, wherein said mini-dystrophin peptide comprisesa spectrin-like repeat domain comprising n spectrin-like repeats,wherein said mini-dystrophin peptide contains no more than nspectrin-like repeats, and wherein n is an even number that is less than24 and at least
 4. 2. The composition of claim 1, wherein saidspectrin-like repeats are dystrophin spectrin-like repeats.
 3. Thecomposition of claim 2, wherein said dystrophin spectrin-like repeatsare human dystrophin spectrin-like repeats.
 4. The composition of claim1, wherein said mini-dystrophin-peptide is capable of altering ameasurable muscle value in a DMD animal model by at least 20% of thewild type value.
 5. The composition of claim 1, wherein saidmini-dystrophin peptide is capable of altering a measurable muscle valuein a DMD animal model to a level similar to the wild-type value.
 6. Thecomposition of claim 1, wherein n is a multiple of
 4. 7. The compositionof claim 1, wherein n is
 4. 8. The composition of claim 1, wherein n is8.
 9. The composition of claim 1, wherein said nucleic acid comprises anexpression vector.
 10. The composition of claim 1, wherein said nucleicacid comprises spectrin-like repeat encoding sequences.
 11. Thecomposition of claim 10, wherein said spectrin-like repeat encodingsequences are precise spectrin-like repeat encoding sequences.
 12. Thecomposition of claim 1, wherein said nucleic acid comprises anactin-binding domain encoding sequence.
 13. The composition of claim 12,wherein said actin binding domain comprises at least a portion of SEQ IDNO:6.
 14. The composition of claim 1, wherein said nucleic acidcomprises a β-dystroglycan binding domain.
 15. The composition of claim14, wherein said β-dystroglycan binding domain comprises at least aportion of a dystrophin hinge 4 encoding sequence, and at least aportion of a dystrophin cysteine-rich domain encoding sequence.
 16. Thecomposition of claim 10, wherein said spectrin-like repeat encodingsequences are selected from the group consisting of SEQ ID NOS:8-10,12-27, and 29-33.
 17. The composition of claim 1, wherein said nucleicacid contains less than 75% of a wild type dystrophin 5′ untranslatedregion.
 18. The composition of claim 1, wherein said mini-dystrophinpeptide further comprises a substantially deleted dystrophin C-terminaldomain.
 19. The composition of claim 1, wherein said nucleic acidsequence contains less than 50% of a dystrophin 3′ untranslated region.20. A composition comprising a mini-dystrophin peptide, wherein saidmini-dystrophin peptide comprises a spectrin-like repeat domaincomprising n spectrin-like repeats, wherein said mini-dystrophin peptidecontains no more than n spectrin-like repeats, and wherein n is an evennumber that is less than 24 and at least
 4. 21. A composition comprisingnucleic acid encoding a mini-dystrophin peptide, wherein saidmini-dystrophin peptide comprises i) a spectrin-like repeat domaincomprising 4 dystrophin spectrin-like repeats, ii) an actin-bindingdomain, and iii) a β-dystroglycan binding domain; and wherein saidmini-dystrophin peptide contains no more than 4 dystrophin spectrin-likerepeats.
 22. The composition of claim 21, wherein saidmini-dystrophin-peptide is capable of altering a measurable muscle valuein a DMD animal model by at least 20% of the wild type value.
 23. Thecomposition of claim 21, wherein said mini-dystrophin peptide is capableof altering a measurable muscle value in a DMD animal model to a levelsimilar to the wild-type value.
 24. The composition of claim 21, whereinsaid nucleic acid is less than 5.0 kb in length.
 25. A method,comprising; a) providing; i) a vector comprising nucleic acid encoding amini-dystrophin peptide, wherein said mini-dystrophin peptide comprisesa spectrin-like repeat domain comprising n spectrin-like repeats,wherein said mini-dystrophin peptide contains no more than nspectrin-like repeats, and wherein n is an even number that is less than24 and at least 4, and ii) a subject comprising a target cell, and b)contacting said vector with said subject under conditions such that saidmini-dystrophin peptide is expressed in said target cell.
 26. The methodof claim 25, wherein said mini-dystrophin peptide further comprises asubstantially deleted dystrophin C-terminal domain.
 27. The method ofclaim 25, wherein said nucleic acid comprises spectrin-like repeatencoding sequences.
 28. The method of claim 27, wherein saidspectrin-like repeat encoding sequences are precise spectrin-like repeatencoding sequences.
 29. The method of claim 25, wherein saidmini-dystrophin-peptide is capable of altering a measurable muscle valuein a DMD animal model by at least 20% of the wild type value.
 30. Themethod of claim 25, wherein said mini-dystrophin peptide is capable ofaltering a measurable muscle value in a DMD animal model to a levelsimilar to the wild-type value.
 31. A composition comprising nucleicacid, wherein said nucleic acid encodes a mini-dystrophin peptide, andwherein said mini-dystrophin peptide comprises a substantially deleteddystrophin C-terminal domain.
 32. The composition of claim 31, whereinsaid substantially deleted dystrophin C-terminal domain is less than 40%of a wild type dystrophin C-terminal domain.
 33. The composition ofclaim 31, wherein said mini-dystrophin-peptide is capable of altering ameasurable muscle value in a DMD animal model by at least 20% of thewild type value.
 34. The composition of claim 31, wherein saidmini-dystrophin peptide is capable of altering a measurable muscle valuein a DMD animal model to a level similar to the wild-type value.